Citation
Principal angles between subspaces as related to rayleigh quotient and raleigh ritz inequalities with applications to eigenvalue accuracy and an eigenvalue solver

Material Information

Title:
Principal angles between subspaces as related to rayleigh quotient and raleigh ritz inequalities with applications to eigenvalue accuracy and an eigenvalue solver
Creator:
Argentati, Merico Edward
Publication Date:
Language:
English
Physical Description:
165 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Rayleigh quotient ( lcsh )
Angles (Geometry) ( lcsh )
Canonical correlation (Statistics) ( lcsh )
Eigenvalues ( lcsh )
Angles (Geometry) ( fast )
Canonical correlation (Statistics) ( fast )
Eigenvalues ( fast )
Rayleigh quotient ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 161-165).
General Note:
Department of Mathematical and Statistical Sciences
Statement of Responsibility:
by Merico Edward Argentati.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
54522667 ( OCLC )
ocm54522667
Classification:
LD1190.L622 2003d A73 ( lcc )

Full Text
PRINCIPAL ANGLES BETWEEN SUBSPACES AS RELATED TO
RAYLEIGH QUOTIENT AND RALEIGH RITZ INEQUALITIES WITH
APPLICATIONS TO EIGENVALUE ACCURACY AND AN EIGENVALUE
SOLVER
by
Merico Edward Argentati
B.S., Worcester Polytechnic Institute, 1970
M.S., University of California at San Diego, 1972
M.S., Unversity of Colorado at Boulder, 1977
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
2003


This thesis for the Doctor of Philosophy-
degree by
Merico Edward Argentati
has been approved
by
Andrew Knyazev
/ Jan Mandel
Date


Argentati, Merico Edward (Ph.D., Applied Mathematics)
Principal Angles Between Subspaces as Related to Rayleigh Quotient and
Raleigh Ritz Inequalities with Applications to Eigenvalue Accuracy and an
Eigenvalue Solver
Thesis directed by Professor Andrew Knyazev
ABSTRACT
In this dissertation we focus on three related areas of research:
1) principal angles (sometimes denoted as canonical angles) between subspaces
including theory and numerical results, 2) Rayleigh quotient and Raleigh Ritz
perturbations and eigenvalue accuracy, and 3) parallel software implementation
and numerical results concerning the eigenvalue solver LOBPCG (Locally Op-
timal Block Preconditioned Conjugate Gradient Method) [35], using parallel
software libraries and interface specifications based on the Lawrence Livermore
National Laboratory, Center for Applied Scientific Computing (LLNL-CASC)
High Performance Preconditioners (Hypre) project.
Concerning principal angles or canonical angles between subspaces, we pro-
vide some theoretical results and discuss how current codes compute the cosine
of principal angles, thus making impossible, because of round-off errors, finding
small angles accurately. We propose several MATLAB based algorithms that
compute both small and large angles accurately, and illustrate their practical
robustness with numerical examples. We prove basic perturbation theorems for
m


absolute errors for sine and cosine of principal angles with improved constants.
MATLAB release 13 has implemented our SVD-based algorithm for the sine.
Secondly, we discuss Rayleigh quotient and Raleigh Ritz perturbations and
eigenvalue accuracy. Several completely new results are presented. One of the
interesting findings characterizes the perturbation of Ritz values for a symmetric
positive definite matrix and two different subspaces, in terms of the spectral
condition number and the largest principal angle between the subspaces.
The final area of research involves the parallel implementation of the
LOBPCG algorithm using Hypre, which involves the computation of eigenvec-
tors that are computed by optimizing Rayleigh quotients with the conjugate
gradient method. We discuss a flexible matrix-free parallel algorithm and
performance on several test problems. This LOBPCG Hypre software has been
integrated into LLNL Hypre Beta Release Hypre-1.8.0b.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
Andrew Knyazev
IV


DEDICATION
I dedicate this thesis to my dear wife Shannon and my shining, creative daughter
Angela who have supported my research and long hours of study over the years,
in spite of my strange obsession with mathematics.
I also dedicate this thesis to my mother and father for their love and support
over the years, and especially for their encouragement that one can do anything
that one wants to if you approach each task with enough confidence and opti-
mism. I also dedicate this thesis to my brothers Jim and John who have always
shown support and interest in my work, even though they were far removed
from the details.
Finally, I also would like to dedicate this thesis to my uncle Jack Morrissey
who inspired me toward science and mathematics, when I was ten years old, by
showing me the moon through his 8 inch Newtonian reflecting telescope.


ACKNOWLEDGMENT
I would like to thank my advisor, Andrew Knyazev, for his support and guidance
during all stages of my research and dissertation preparation.
This work was in part supported by NSF Award DMS 0208773, Precon-
ditioned Algorithms for Large Eigenvalue Problems, PI Andrew Knyazev. A
portion of this work, involving the scalable eigensolver, was supported by
Lawrence Livermore National Laboratory, Center for Applied Scientific Comput-
ing (LLNL-CASC). I would like to thank Rob Falgout, Charles Tong, Panayot
Vassilevski and other members of the LLNL-CASC team for their help.
I would also like to thank several CU-Denver graduate students, includ-
ing Sean Jenson, who helped to test the SUBSPACEA.m code, and Chan-
Chai Aniwathananon and Saulo Oliveira, who participated in testing the code
ORTHA.m. Rapheal Bar-Or has been very helpful concerning the development
of enhancements to algorithms for computing angles between subspaces. Tessa
Weinstein has also been a great source of ideas and encouragement which have
contributed to progress in my general research.


CONTENTS
Figures ................................................................ x
Tables................................................................. xi
Algorithms............................................................ xii
Chapter
1. Introduction......................................................... 1
1.1 Overview, Motivation and Organization 1........................... 1
1.2 Notation........................................................... 6
2. Principal Angles Between Subspaces................................... 8
2.1 Introduction to Principal Angles................................... 8
2.2 Inaccuracy in the Cosine-Based Algorithm.......................... 17
2.3 Properties of Principal Angles and a
Sine-Based Algorithm ............................................. 18
2.4 Generalization to an A-Based Scalar Product....................... 31
2.5 Perturbation of Principal Angles in the
A-Based Scalar Product............................................ 39
2.6 Algorithm Implementation.......................................... 51
2.7 Numerical Examples................................................ 57
2.8 Availability of the Software ..................................... 72
2.9 Conclusions Concerning Angles Between
Subspaces......................................................... 73
3. Rayleigh Quotient Perturbations and
Eigenvalue Accuracy................................................. 75
vii


3.1 Introduction to Rayleigh Quotient
Perturbations.................................................... 75
3.2 Subspaces and the Rayleigh-Ritz Procedure........................ 79
3.3 Analysis in 2-Dimensions ........................................ 85
3.3.1 Eigenvalue Accuracy in 2-Dimensions ........................... 88
3.3.2 Rayleigh Quotient Perturbations in
2-Dimensions................................................... 89
3.4 Analysis in the General Case .................................... 90
3.4.1 Eigenvalue Accuracy............................................ 90
3.4.2 Rayleigh Quotient Perturbations................................ 94
3.5 Bounds for Rayleigh-Ritz Estimates .............................. 99
3.5.1 Eigenvalue Approximation by Ritz Values........................ 99
3.5.2 Perturbation of Ritz Values................................... 100
3.6 Rayleigh Quotient Perturbations using a
Matrix Inner Product............................................ 101
3.7 Conclusions Concerning Rayleigh Quotient
Perturbations................................................... 106
4. Implementation of a Preconditioned
Eigensolver Using Hypre.......................................... 108
4.1 Introduction to the Preconditioned
Eigenvalue Solver............................................... 108
4.2 Background Concerning the LOBPCG
Algorithm....................................................... 109
4.3 The Block LOBPCG Algorithm...................................... 109
4.4 LOBPCG Hypre Implementation Strategy............................ 116
4.5 Software Implementation Details ................................ 118
viii


4.6 Applications Program Interface (API) ......................... 120
4.7 Compiling and Testing......................................... 120
4.8 Numerical Results ........................................... 121
4.9 Conclusions Concerning the Hypre LOBPCG
Implementation................................................ 128
5. Overall Conclusions and Impact.................................. 130
5.1 Overall Conclusions........................................... 130
5.2 Impact........................................................ 133
6. Future Work Directions.......................................... 135
Appendices
A MATLAB Code for SUBSPACEA.M and ORTHA.M....................... 136
B LOBPCG Hypre API Functions.................................... 141
C Primary LOBPCG Functions lobpcg.c........................... 143
References......................................................... 161
IX


FIGURES
Figure
2.1 Errors in principal angles as functions of n/2 62
2.2 Errors in individual angles in algorithms 2.2 and 2.4 with A = I . . 65
2.3 Errors in individual angles between F2 and G2 and F$ and G3 . . . 67
2.4 [ik as functions of l and errors as functions of condition number . . 70
4.1 LOBPCG Hypre software modules..................................... 119
4.2 LOBPCG scalability as problem size increases...................... 123
4.3 Execution time to converge as problem size increases.............. 124
4.4 Performance versus quality of preconditioner...................... 125
4.5 Convergence rates for different preconditioners................... 127


TABLES
Table
2.1 Accuracy of computed angles..................................... 18
2.2 Computed sine and cosine of principal angles of the example of [5] 59
4.1 LOBPCG test script............................................... 121
4.2 Timing results scalability .................................... 122
4.3 Scalability data for 3-D Laplacian ............................. 124
xi


ALGORITHMS
Algorithm
2.1 SUBSPACE.m.......................................... 21
2.2 Modified and Improved SUBSPACE.m.................... 24
2.3 SUBSPACEA.m......................................... 54
2.4 Modified and Improved SUBSPACEA.m................... 56
4.1 Block LOBPCG Algorithm............................. 110
4.2 Block LOBPCG Algorithm in Hypre.................... 114
xii


1. Introduction
1.1 Overview, Motivation and Organization
In this dissertation we focus on three related areas of research:
1) principal angles (sometimes denoted as canonical angles) between subspaces
including theory and numerical results, 2) Rayleigh quotient and Raleigh Ritz
perturbations and eigenvalue accuracy, and 3) parallel software implementation
and numerical results concerning the eigenvalue solver LOBPCG (Locally Op-
timal Block Preconditioned Conjugate Gradient Method) [35], using parallel
software libraries and interface specifications based on the Lawrence Livermore
National Laboratory, Center for Applied Scientific Computing (LLNL-CASC)
High Performance Preconditioners (Hypre) project.
The framework for this research embodies finite dimensional real vector
spaces, vectors, subspaces, angles between vectors/subspaces, orthogonal pro-
jectors, symmetric and/or positive definite operators, eigenvalue problems,
Rayleigh quotients and Rayleigh-Ritz approximations.
In Chapter 2 we discuss principal angles between subspaces. Computation
of principal angles or canonical angles between subspaces is important in many
applications including statistics [8], information retrieval [43], and analysis of
algorithms [52]'. This concept allows us to characterize or measure, in a natural
way, how two subspaces differ, which is the main connection with perturbation
theory. When applied to column-spaces of matrices, the principal angles describe
canonical correlations of a matrix pair. We provide some theoretical results
1


and discuss how current codes compute cosine of principal angles, thus making
impossible, because of round-off errors, finding small angles accurately. We
review a combination of sine and cosine based algorithms that provide accurate
results for all angles, and include a generalization to an A-based scalar product
for a symmetric and positive definite matrix A. We propose several MATLAB
based algorithms that compute both small and large angles accurately, and
illustrate their practical robustness with numerical examples. We prove basic
perturbation theorems for absolute errors for sine and cosine of principal angles
with improved constants. Numerical examples and a description of our code are
given. MATLAB release 13 has implemented our SVD-based algorithm for the
sine.
A portion of this research involving angles, was presented in several talks
including a talk at the Seventh Copper Mountain Conference on Iterative Meth-
ods, March 24-29, 2002, Copper Mountain: Principal Angles Between Subspaces
in an A-Based Scalar Product: Dense and Sparse Matrix Algorithms joint with
Andrew Knyazev, and another talk at the Sixth IMACS International Sympo-
sium on Iterative Methods in Scientific Computing March 27-30, 2003 Univer-
sity of Colorado at Denver: Principal Angles Between Subspaces in an A-Based
Scalar Product: Interesting Properties, Algorithms and Numerical Examples -
joint with Andrew Knyazev.
In Chapter 3 we discuss Rayleigh quotient and Raleigh Ritz perturbations
and eigenvalue accuracy. There are two reasons for studying this problem [52]:
first, the results can be used in the design and analysis of algorithms, and
2


second, a knowledge of the sensitivity of an eigenpair is of both theoretical and of
practical interest. Rayleigh quotient and Raleigh Ritz equalities and inequalities
are central to determining eigenvalue accuracy and in analyzing many situations
when individual vectors and/or subspaces are perturbed. We address the case
of a perturbation of general vector, as well as perturbations involving a general
vector and an eigenvector, and perturbations involving subspaces. The absolute
magnitude of these perturbations is a function of 1) the characteristics of the
matrix A (e.g., condition number), 2) the angles between vectors/subspaces,
and 3) orthogonal projectors onto the vectors/subspaces that are involved. So
the framework for angles, which is discussed in Chapter 2, is also important as
a background for Chapter 3.
Several completely new results are presented. One of the interesting findings
characterizes the perturbation of Ritz values for a symmetric positive definite
matrix and two different subspaces, in terms of the spectral condition number
and the largest principal angle between the subspaces. Thus, this is an area
where angles between subspaces can be used to obtain some elegant results.
The Rayleigh quotient is defined by
p(x) = p(x; A) = {Ax, x)/{x, x),
and we are often interested in bounding the absolute difference |p(x; A)p{y\ A)|,
when A £ RnXn is a symmetric and/or positive definite matrix and x and y are
general vectors, or when one of them is an eigenvector. It is natural to charac-
terize these perturbation as some function of angles between individual vectors
and/or by the largest principal angle between the subspaces. In this investiga-
3


tion some relevant and interesting perturbations are discussed and proofs are
included. For example we prove that
- (K(i4)":)sin(^{xiy})j
where k(A) is the spectral condition number of A. Considering two subspaces
X, y C Rnxn we obtain the new result
hi< Wj4) 1) si(z{^, y}) 3 = 1,.... m,
aj
where ctj,Pj for j = 1,... ,m are the Ritz values of a positive definite matrix
A w.r.t to the subspaces X and y and A{X, T} is the largest principal angle
between the subspaces X and y.
In our analysis we first restrict the domain to a 2-D subspace. These 2-D
results are then extended to larger dimensional spaces. In the literature, this is
referred to as a mini-dimensional analysis [39]. This extension requires a care-
ful analysis of the properties of subspaces and Rayleigh-Ritz approximations,
which is provided. We also provide several alternative proofs, one of which uses
a somewhat unique approach of expressing the Rayleigh quotient as a Frobe-
nius inner product; p(x\ A) = (A, Px) = trace(APx), where Px is the orthogonal
projector onto x.
In Chapter 4 we discuss the LOBPCG eigensolver algorithm, in detail, and
the parallel'sbftware implementation of this algorithm. This final area of research
involves a project to develop a Scalable Preconditioned Eigenvalue Solver for the
solution of eigenvalue problems for large sparse symmetric matrices on massively
parallel computers. This work involves the implementation of the LOBPCG
4


algorithm, where eigenvectors are computed by optimizing the Rayleigh quotient
with the Conjugate Gradient Method. This Rayleigh quotient optimization uses
the Raleigh-Ritz method, hence this is the connection with Chapter 3 of this
work.
The code implementation takes advantage of advances in the Scalable Linear
Solvers project, in particular in multigrid technology and in incomplete factor-
izations (ILU) developed under the Hypre project, at LLNL-CASC. The solver
allows for the utilization of Hypre preconditioners for symmetric eigenvalue prob-
lems. We discuss a flexible matrix-free parallel algorithm, and the capabilities
of the developed software. We gain a large amount of leverage through use of
the Hypre parallel library functions that handle the construction of a flexible set
of preconditioners, application of the preconditioned solve during execution of
the algorithm and other linear algebraic operations such as matrix-vector multi-
plies, scalar products, etc. In the LOBPCG algorithm we implement soft locking,
whereby certain vectors may be eliminated from the Rayleigh Ritz procedure at
each iteration. This is done somewhat differently as compared to locking as it is
discussed in [2]. This entire development amounts to approximately 4,000 lines
of non-commentary source code. We also discuss performance on several test
problems. The LOBPCG Hypre software has been integrated into the Hypre
software at LLNL and is part of the Hypre Beta Release Hypre-1.8.0b.
A portion of the work, concerning the implementation of a Hypre version of
LOBPCG, was presented at Eleventh Copper Mountain Conference on Multigrid
Methods, March 30 April 4, 2003: Implementation of a Scalable Preconditioned
5


Eigenvalue Solver Using Hypre joint with Andrew Knyazev.
Many of the insights in this investigation stem from concrete examples and
problems, and special cases. For example, concerning Rayleigh quotient in-
equalities, the special case of 2-D analysis yields major results for the general
case. In the case of angles between subspaces, the simple example of the inaccu-
racy in the current algorithm provided insights to remedy the problem. Finally
the LOBPCG parallel implementation has benefited from code development of
MATLAB and C-language programs, and through a number of numerical ex-
periments using both of these earlier implementations.
1.2 Notation
Rn - Vector space of real n-tuples.
Rnxm _ Vector space of real n x m matrices.
x, y,... - Lower case Roman letters denote vectors.
A,B,... - Upper case Roman letters denote matrices.
T,Q,... - Calligraphic letters denote subspaces of Rn.
(;x, y) - Euclidean inner product(x, y) = xTy where x, y £ Rn.
||z|| - Euclidean norm ||x||2 = (x,x) where x £ Rn.
|| A|| - For a matrix this is the induced operator norm which is
derived from the Euclidean norm and is also called the
spectral norm.
||x||b - R-norm \\x\\% = (Rrr, x) where x £ Rn and B £ Rnxn is a
positive definite matrix.
||A||b - For a matrix this is the induced operator norm which is
6


derived from the 5-norm.
(A B)
k(A)
F1-
Feg
p(x-,A)
r(x; A)
z{x,y}
Z{F,g}
- For A,Be Rnxn this is the Frobenius inner product
(A, B) = trace (ATB).
- Spectral condition number k(A) = ||A||||^4_1|| = ^
Ai
for A positive definite.
- The orthogonal complement of the subspace F.
- Is the subspace5 intersected with the orthogonal complement
of g, that is f g = f n gL.
- The Rayleigh quotient is defined for x Rn with x ^ 0 by
p{x\A) = (Ax,x)/(x,x). If it is obvious which matrix A is
involved, the form p(x) may be used.
- The residual is defined for x Rn with x ^ 0 by
r(x; A) = (A p(x; A)I)x. If it is obvious which matrix A is
involved, the form r(x) may be used.
- The acute angle between the vectors Rn with x, y ^ 0
is given by Z{x,y} = arccos 1^.
- The largest principal angle between the subspaces
f, g C Rn.
7


2. Principal Angles Between Subspaces
2.1 Introduction to Principal Angles
In this chapter we present concepts and results concerning angles between
subspaces. Given two non-zero vectors, the acute angle between the vectors x, y
G Rn with x, y ^ 0 is denoted by
Z{x, y} = arccos
7r
It is important to emphasize that 0 < Z{x,y} < , by definition.
Considering two subspaces, we can recursively define a set of angles between
them, which are denoted as principal or canonical angles. Let us consider two
real-valued matrices F and G, each with n rows, and their corresponding column-
spaces T and Q, which are subspaces in Rn, assuming that
p = dim.F > dimC? = q > 1.
Then the principal angles
6u...,6ge[ 0,tt/2]
between T and Q may be defined, e.g., [29, 21], recursively for k = 1,..., q by
cos(#fc) = max,, £jr max^ vFv = v%Vk
subject to
IMI = IMI = 1> uTiLi = 0, vTVi = 0, i = 1,... ,k 1.
8


The vectors ui,... ,uq and vi,_, vq are called principal vectors. We first find 6\
and the corresponding principal vectors ui and Vi. To obtain the second angle,
we search in subspaces that are orthogonal, respectively to iq and v\. Continuing
in this manner, always searching in subspaces orthogonal to principal vectors
that have already been found, we obtain the complete set of principal angles
and principal vectors.
Here and below || || denotes the standard Euclidean norm of a vector or,
when applied to a matrix, the corresponding induced operator norm, also called
the spectral norm, of the matrix.
We axe often interested in the largest principal angle 0q. In this investigation
we will denote the largest principal angle by
e, = z{r,s}.
Definition 4.2.1 of [52], provides an equivalent, slightly more intuitive char-
acterization of the largest principal angle when p = g, which is given by
Qq = maxue^-minegZ{u, v}.
0
According to [51, 47], the notion of canonical angles between subspaces goes
back to Jordan (1875). Principal angles between subspaces, and particularly the
smallest and the largest angles, serve as important tools in functional analysis
(see books [!', 20, 30] and a survey [14]) and in perturbation theory of invariant
subspaces, e.g., [9, 51, 48, 33, 42],
Computation of principal angles between subspaces is needed in many ap-
plications. For example, in statistics, the angles are closely related to measures
9


of dependency and covariance of random variables; see a canonical analysis of
[8]. When applied to column-spaces T and Q of matrices F and G, the principal
angles describe canonical correlations which is important in applications such as system identification and information
retrieval. Principal angles between subspaces also appear naturally in compu-
tations of eigenspaces, e.g., [34, 35], where angles provide information about
solution quality and need to be computed with high accuracy.
In such large-scale applications, it is typical that n p; in other words,
informally speaking, we are dealing with a small number of vectors having a
large number of components-. Because of this, we are interested in matrix-free
methods; i.e., no n-by-n matrices need to be stored in memory in our algorithms.
A singular value decomposition (SVD)-based algorithm [18, 5, 7, 21, 23]
for computing cosines of principal angles can be formulated as follows. Let
columns of matrices Qp G Rnxp and Qc 6 Rnxq form orthonormal bases for
the subspaces T and Q, respectively. The reduced SVD of Q'pQc is
YTQpQGZ = diag(cri, 0i > > aq > 0, (2.1)
where Y G Rpxq, Z G Rqxq both have orthonormal columns. Then the principal
angles can be computed as
6k = arccos(crfc), k = 1,..., q, (2.2)
where
0 < 61 < < 6q <
10


while principal vectors are given by
uk = QpVk, vk = QGzk, k = l,...,q.
The equivalence [5, 21] of the original geometric definition of principal an-
gles and the SVD-based approach follows from the next general theorem on an
equivalent representation of singular values.
Theorem 2.1 //Me Rmxn, then the singular values of M are defined recur-
sively by
uk := maxy £Rm max2 yTMz = yfMzk, k = 1,..., min{m, n},
The vectors yi and Zi are, respectively, left and right singular vectors.
Proof: The proof of the theorem is straightforward if based on Allakhverdievs
representation (see [20]) of singular numbers,
and using the well-known formula of the induced Euclidean norm of a matrix as
(2.3)
subject to
IMI = INI = i, vTyi = o, zTZi = o, i = i, 1. (2.4)
fc-i
the norm of the corresponding bilinear form.
To apply the theorem to principal angles, one takes M = Q'pQc-
11


In the most recent publication on the subject, [16], the SVD-based algorithm
for cosine is proved to be stable, but QR factorizations with complete pivoting
are recommended for computing Qp and Qc to improve stability.
The SVD-based algorithm for cosine was considered as standard and is
implemented in software packages, e.g., in MATLAB, version 5.3, 2000, code
SUBSPACE.m, revision 5.5,1 where Qp £ RnXp and Qg £ RnXq are computed
using the QR factorization.
However, this algorithm cannot provide accurate results for small angles
because of the presence of round-off errors. Namely, when using the standard
double-precision arithmetic with EPS ~ 1016 the algorithm fails to accurately
compute angles smaller than 10~8 (see section 2.2). The problem has been
highlighted in the classical paper [5], as well as a cure has been suggested (see
also publications on cosine-sine (CS) decomposition methods [50, 54, 51, 47]),
but apparently it did not attract enough attention.
In statistics, most software packages include a code for computing o~k =
cos(9k), which are called canonical correlations', see, e.g., CANCOR Fortran code
in FIRST MDS Package of AT&T, CANCR (DCANCR) Fortran subroutine in
IMSL STAT/LIBRARY, G03ADF Fortran code in NAG package, CANCOR
subroutine in Splus, and CANCORR procedure in SAS/STAT Software. While
accurately computing the cosine of principal angles in corresponding precision,
these codes do not compute the sine. However, the cosine simply equals one in
1Revision 5.8 of SUBSPACE.m in MATLAB release 12.1, version 6.1.0.450, May 18, 2001,
is still identical to revision 5.5, which we have used for numerical tests in this investigation.
MATLAB release 13 has implemented the SVD-based algorithm for sine.
12


double precision for all angles smaller than 10-8 (see next section). Therefore,
it is impossible in principal to observe an improvement in canonical correla-
tions for angles smaller than 10-8 in double precision. It might not be typically
important when processing experimental statistical data because the expected
measurement error may be so great that a statistician would deem the highly
correlated variable essentially redundant and therefore not useful as a further
explanatory variable in their model. Statistical computer experiments are dif-
ferent, however, as there is no measurement error, so accurate computation of
very high correlations may be important in such applications.
The largest principal angle is related to the notion of distance, or a gap,
between equidimensional subspaces. If p = q, the distance is defined [1, 20, 21,
30] as
gap(.F, Q) = ||Pf PG\\ = sin(9q) = yjl- (cos(0g))2, (2.5)
where Pf and Pq are orthogonal projectors onto T and Q, respectively.
This formulation provides insight into a possible alternative algorithm for
computing the sine of principal angles. The corresponding algorithm, described
in [5], while being mathematically equivalent to the previous one in exact arith-
metic, is accurate for small angles in computer arithmetic as it computes the
sine of principal angles directly, without using SVD (2.1) leading to the cosine.
We review the algorithm of [5] based on a general form of (2.5) in section 2.3
and suggest an improved version, similar to the CS decomposition algorithm of
[54], with the second SVD of the reduced size.
13


The CS decomposition methods, e.g., [50, 54, 51, 47], one of which we just
mentioned, provide a well-known and popular alternative approach for com-
puting principal angles between subspaces given by selected p (q) columns of
orthogonal matrices of the size n. For example, if the matrix QF, with or-
thonormal columns that span the subspace P1-, the orthogonal complement
of J7, is available to us, the CS decomposition methods compute the SVD of
(Qf)TQg together with the SVD of (Qf)tQg, thus providing cosine and sine
of principal angles. When p is of the same order as n/2, matrix QF is about
of the same size as matrix QF, and the CS decomposition methods are effective
and are recommended for practical computations. However, when n p, the
CS decomposition methods, explicitly using matrix QF of the size n-by-n p,
will be less efficient compared to matrix-free methods we consider in this in-
vestigation. Let us highlight that the cosine- and sine-based methods of [5] that
we investigate here in section 2.3, while different algorithmically from the CS
decomposition methods, are very close mathematically to them.
A different sine-based approach, using eigenvalues of PF Pg, is described
in [7, 47]; see a similar statement of Theorem 2.5. It is also not attractive
numerically, when n^> p, as it requires computing an n-by-n matrix and finding
all its nonzero eigenvalues.
In some applications, e.g., when solving symmetric generalized eigenvalue
problems [34], the default scalar product uTv cannot be used and needs to
be replaced with an A-based scalar product (u,v)a = uTAv, where A is a
symmetric positive definite matrix. In statistics, a general scalar product for
14


computing canonical correlations gives a user an opportunity, for example, to
take into account a priori information that some vector components are more
meaningful than others. In a purely mathematical setting, generalization to
A-based scalar products brings nothing really new. In practical computations,
however, it carries numerous algorithmic and numerical problems, especially for
ill-conditioned cases, which are important in applications.
In section 2.4, we propose extension of the algorithms to an A-based scalar
product and provide the corresponding theoretical justification.
In section 2.5, we turn our attention to perturbation estimates, which gener-
alize the following trigonometric inequalities: if an angle 0 G [0, 7t/2] is perturbed
by e G [0, 7t/2] such that 9 + e G [0, 7t/2] then
0 < cos(0) cos(# + e) < sin(0 + e) sin(e) < sin(e),
0 < sin((9 + e) sin($) < cos(0) sin(e) < sin(e).
We prove new absolute perturbation estimates for the sine and cosine of principal
angles computed in the A-based scalar product. When A I, our estimates
are similar to those of [5, 55, 53, 23, 22], but the technique we use is different.
More importantly, our constants are somewhat better, in fact, in the same way
the constants in the middle terms of the trigonometric inequalities above are
less than one.
We consider particular implementation of algorithms used in our MATLAB
code SUBSPACEA.m in section 2.6, with emphasis on the large-scale case, n
p, and sparse ill-conditioned matrix A, which may be specified only as a function
that multiplies A by a given vector. When matrices F and G are sparse, our
15


code can still be used even though it performs orthogonalization of columns of
matrices F and G that increases the fill-in; cf. [23]. Also, we do not even touch
here upon a practically important issue of the possibility of recomputing the
correlations with an increase in the data; see again [23].
Finally, numerical results, presented in section 2.7, demonstrate the practical
robustness of our code.
For simplicity, we discuss only real spaces and real scalar products; however,
all results can be trivially generalized to cover complex spaces as well. In fact,
our code SUBSPACEA.m is written for the general complex case.
As was pointed out in [40], there are several natural questions that are of
interest, but are not considered here.
Our algorithms are based on SVD. How does SVD accuracy (cf. [4, 12, 11,
16]), especially for small singular values, or in ill-conditioned cases, affect
the results?
In [16], a formal stability analysis is done for the SVD-based algorithm
for cosine, which is proved to be mixed stable. In our numerical tests,
practical robustness of our algorithms is encouraging. Are our methods
accurate and stable theoretically, e.g., see [26]?
For A-based scalar products, how does the increase of the condition num-
ber of A influence the accuracy? Which parts of our algorithm are respon-
sible for the main error growth?
16


We feel, however, that investigating these matters is not within the scope of
this work. They may rather serve as interesting directions for future research.
2.2 Inaccuracy in the Cosine-Based Algorithm
Let d be a constant and
T = span j(l Of} Q span j(l df} .
Then the angle between the one-dimensional subspaces T and Q can be com-
puted as
a ( d
VflTcP
In the table below d varies from one to small values. Formula (2.6) is
accurate for small angles, so we use it as an exact answer in the second column
of the table. We use the MATLAB built-in function SUBSPACE.m (revision
5.5) which implements (2.1) to compute values for the third column of the table.
It is apparent that SUBSPACE.m returns inaccurate results for d < 10-8,
which is approximately y/EPS for double precision. In this simple one-
dimensional example the algorithm of SUBSPACE.m is reduced to computing
0 = arccos (. = = .) .
This formula clearly shows that the inability to compute accurately small angles
is integrated in the standard algorithm and cannot be fixed without changing
the algorithm itself. The cosine, that is, a canonical correlation, is computed
accurately and simply equals to one for all positive d < 10-8. However, one
cannot determine small angles from a cosine accurately in the presence of round-
off errors. In statistical terms, it illustrates the problem we already mentioned
17


d Formula (2.6) SUBSPACE.m
1.0 1.0e-04 1.0e-06 1.0e-08 1.0e-10 1.0e-16 1.0e-20 1.0e-30 7.853981633974483e-001 9.999999966666666e-005 9,999999999996666e-007 1.000000000000000e-008 1.000000000000000e-010 9.999999999999998e-017 9.999999999999998e-021 1.000000000000000e-030 7.853981633974483e-001 9.999999986273192e-005 1.000044449242271e-006 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017 -6.125742274543099e-017
Table 2.1: Accuracy of computed angles
above that the canonical correlation itself does not show any improvement in
correlation when d is smaller than 10~8 in double precision.
In the next section, we consider a formula [5] that directly computes the
sine of principal angles as in (2.6).
2.3 Properties of Principal Angles and a
Sine-Based Algorithm
We first review known sine-based formulas for the largest principal angle.
Results of [1, 30] concerning the aperture of two linear manifolds give
ll-Pp PaII = max {maxx5iW=i ||(7 PF)x\\, max^|W,i ]|(/ Pa)s||} .
(2.7)
Let columns of matrices Qp RnXp and Qcj £ Rnxq form orthonormal
bases for the subspaces T and Q, respectively. Then orthogonal projectors on
T and Q are Pp = QpQ'p and Pc = QgQgi respectively, and
||PF Poll = max{||(J QfQtf)Qc\\, |l(7 QoQDQfII}. (2.8)
18


If p ^ g, then expression of (2.8) is always equal to one; e.g., if p > q, then the
second term under the maximum is one. If p = q, then both terms are the same
and yield sin(0g) by (2.5). Therefore, under our assumption p > g, only the first
term is interesting to analyze. We note that the first term is the largest singular
value of (7 QfQf)Qg What is the meaning of other singular values of the
matrix?
This provides an insight into how to find a sine-based formulation to obtain
the principal angles, which is embodied in the following theorem [5].
Theorem 2.2 Singular values Ah < /h < < pLq of the matrix (7 QfQ1f)Qg
are given by fik = \/l of, k = 1,... ,g, where cr*, are defined in (2.1). More-
over., the principal angles satisfy the equalities Qk = arcsin(/h).
The right principal vectors can be computed as
Vk = QcZk, k = l,...,q,
where Zk are corresponding orthonormal right singular vectors of matrix (7
QfQf)Qg- The left principal vectors are then computed by
uk = QFQpVk/cTk ifak^0, k = 1,..., g.
Proof: Our proof is essentially the same as that of [5]. We reproduce it here
for completeness as we use a similar proof later for a general scalar product.
Let B = (7 Pf)Qg = (7 QfQf)Qg Using the fact that 7 Pp is a
projector and that QqQg = 7, we have
BtB = Qtg{I-Pf){I-Pf)Qg = QTg{I-Pf)Qg
= 7 QgQfQfQg-
19


Utilizing the SVD (2.1), we obtain Q^Qg = YT,ZT, where
S = diag (01, cr2,..., aq); then
ZtBtBZ = / E2 = diag (1 of, 1 of,..., 1 of).
Thus, the singular values of B are given by fik = x/iTof, A: = 1,..., q, and
the formula for the principal angles 9k = arcsin(^) follows directly from (2.2).
We can now use the theorem to formulate an algorithm for computing all
the principal angles. This approach meets our goal of a sine-based formulation,
which should provide accurate computation of small angles. However, for large
angles we keep the cosine-based algorithm.
20


Algorithm 2.1: SUBSPACE.m
Input: real matrices F and G with the same number of rows.
1. Compute orthonormal bases QF = orth(F), Qg = orth(G) of column-spaces
of F and G.
2. Compute SVD for cosine: [Y, S, Z] = svc1(Q^(5g)j S = diag (cri,..., crq).
3. Compute matrices of left Ucos = QpY and right Yos = QgZ principal
vectors.
7.
8.
Qg ~ Qf(QfQg) if diag (QF) > diag (QG);
v Qf ~ Qg{QgQf) otherwise.
Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(S).
Compute matrices Us-m and V^jn of left and right principal vectors:
V'sin = QgZ, Usin = Qf(QFVsin)Z,~l if diag (QF) > diag (Qg)',
Us in ~ QfZ, Vsm = QG(QG^sin)^ ^ otherwise.
Compute the principal angles, for k = 1,..., q:
4. Compute matrix B =
5.
6.
9k =
<1/2;
pi <1/2.
arccos(<7fc) if
arcsin(/i/c) if
Form matrices U and V by picking up corresponding columns of USia, V^in
and Ucos, Ycos, according to the choice for 9k above.
Output: Principal angles 9i,... ,9q between column-spaces of matrices F and G,
and corresponding matrices U and V of left and right principal vectors,
respectively.
Remark 2.1 In step 1 of the algorithm, the orthogonalization can be performed
using the QR. method or the SVD. In our actual code, an SVD-based built-in
MATLAB function ORTH.m is used for the orthogonalization.
It is pointed out in [16] that errors in computing QF and Qg, especially-
expected for ill-conditioned F and G, may lead to an irreparable damage in final
21


answers. A proper column scaling of F and G could in some cases significantly
reduce condition numbers of F and G. We highlight that an explicit columnwise
normalization of matrices F and G is hot required prior to orthogonalization if
a particular orthogonalization algorithm used here is invariant under column
scaling in finite precision arithmetic. Our numerical tests show that the explicit
column scaling is not needed if we utilize a built-in MATLAB function QR.m for
orthonormalization. However, the explicit column scaling apparently helps to
improve the accuracy when the SVD-based built-in MATLAB function ORTH.m
is used for orthonormalization. In [16];"QR factorizations with complete pivoting
are recommended for computing Qp and Qq.
Remark 2.2 A check of diag (Qp) > diag (Qg) in steps 4 and 6 of the algo-
rithm removes the need for our assumption p = diag (Qp) > diag (Qg) = Q-
Remark 2.3 We replace here in step 4
(I QfQ'p)Qg = Qg Qf(QfQg), (I QgQg)Qf = Qf ~ Qg(QgQf)
to avoid storing in memory any n-by-n matrices in the algorithm, which allows
us to compute principal angles efficiently for large n p as well.
If the matrix Qpx, with orthonormal columns that span the subspace F1,
the orthogonal complement of F, was available to us when p > q, we could take
here '
B = Qfa.Qg,
as in the CS decomposition methods; see [50, 54, 51, 47]. Under our assumption
n p, however, the matrix Qpx is essentially of the size n and thus shall be
22


avoided.
The 1/2 threshold used in Algorithm 2.1 in steps 7 and 8 to separate small
and large principal angles and corresponding vectors seems to be a natural
choice. However, such an artificial fixed threshold may cause troubles with
orthogonality in the resulting choice of vectors in step 8 if there are several an-
gles close to each other but on different sides of the threshold. The problem
is that the corresponding principal vectors, picked up from two orthogonal sets
computed by different algorithms, may not be orthogonal. A more accurate
approach would be to identify such possible cluster of principal angles around
the original threshold and to make sure that all principal vectors corresponding
to the cluster are chosen according to either step 3, or step 6, but not both.
23


Algorithm 2.2: Modified and Improved SUBSPACE.m
Input: real matrices F and G with the same number of rows.
1. Compute orthonormal bases Qp = orth(F), Qc = orth(G) of column-spaces
of F and G.
2. Compute SVD for cosine: [Y, E, Z\ = svd(QpQc), S = diag ( 3. Compute matrices of left Ucos = QfY and right Vcos = QqZ principal
vectors.
4. Compute large principal angles, for k = 1,..., q:
Ok = arccos (cjfc) if o\ < 1/2.
5. Form parts of matrices U and V by picking up corresponding columns of
UCos, Kos, according to the choice for Ok above. Put columns ofUcos, Icos>
which are left, in matrices Rp and Rq- Collect the corresponding os in a
diagonal matrix Up.
6. Compute the matrix B = Rq Qf(QpRg)-
7. Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(B).
8. Compute matrices Usjn and of left and right principal vectors:
Y>in RgZ, Usin = Rp (Rp lsin) Dp^.
9. Recompute the small principal angles, for k = 1,..., q:
Ok = arcsin(pfc) if pi ^ l/2-
10. Complete matrices U and V by adding columns of Usin, V^in.
Output: Principal angles 0\,...,0q between column-spaces of matrices F and
G, and corresponding matrices U and V of left and right principal vectors,
respectively.
Let us repeat that, in exact arithmetic, the sine and cosine based approaches
give the same results; e.g., columns of Usin and V^in must be the same as those
of Ucos and V^os- Why do we need to recompute essentially the same vectors a
second time? What if we compute only Ucos, Kos and then recompute just small
24


principal angles using, e.g., the obvious formula
Mfc = IK cxkvk||? (2.9)
This approach was discussed in [40]. It was suggested that it would resolve
the inaccuracy in the cosine-based algorithm illustrated in the previous section,
without the need for the second SVD.
The answer is that the cosine-based algorithm fails to compute accurately
not only the small principal angles but also the corresponding principal vectors.
The reason for this is that singular values computed in step 2 of Algorithm 2.1
are the cosines of principal angles, while singular values of the matrix B in step
5 are the sines of principal angles. Thus, the distribution of singular values is
different in steps 2 and 5; e.g., singular values corresponding to small angles
are much better separated in step 5 than in step 2. For example, angles 1CT10
and 1CT12 will produce a multiple singular value 1 in step 2 in double precision
but will produce two distinct small singular values in step 5. This means that
singular vectors, corresponding to small principal angles, might not be computed
accurately in computer arithmetic using only SVD in step 2, which will also lead
to inaccurate computation of the small principal angles by formula (2.9). Our
numerical tests support this conclusion.
There is some obvious redundancy in Algorithm 2.1. Indeed, we do not need
to calculate columns of t/s;n and V^n, corresponding to large sines, and columns
of {/cos and Vcos, corresponding to large cosines, as they are computed inaccu-
rately in computer arithmetic and we just discard them later in the algorithm.
However, first, for large-scale applications with n p that we are interested in,
25


the redundancy is insignificant. Second, this allows us to perform steps 2-3 and
steps 4-5 of Algorithm 2.1 independently in parallel. We note that E must be
invertible in Algorithm 2.1.
For sequential computations, we now describe Algorithm 2.2. Here, we re-
duce computational costs of the second SVD by using already computed vectors
Ucos and Kos for the cosines. The cosine-based algorithm computes inaccurately
individual principal vectors corresponding to small principal angles. However, it
may find accurately the corresponding invariant subspaces spanned by all these
vectors. Thus, the idea is that, usingT7C0S and Vcos, we can identify invariant
subspaces in T and Q, which correspond to all small principal angles. Then,
we perform the second SVD only for these subspaces, computing only columns
of Usin and that we actually need, which may significantly reduce the size
of the matrix in the second SVD. This idea is used in the CS decomposition
algorithm of [54],
We keep steps 1-3 of Algorithm 2.1 unchanged but modify accordingly later
steps to obtain Algorithm 2.2. Such changes may significantly reduce the size
of matrix B and, thus, the costs, if large principal angles outnumber small ones;
e.g., if there are no small principal angles at all, the algorithm simply stops at
step 3. We note that matrix Sr is always invertible, unlike matrix S.
Remark 2.4 By construction, matrices Rp and Rg have the same number of
already orthogonal columns, which removes the need for orthogonalization and
for comparing, in step 6, their ranks.
Remark 2.5 We have three, equivalent in exact arithmetic, possibilities to com-
26


pute matrix B :
B (I RfR^)Rg = Rg Rf{RfRg) 1 Rg Qf{Q^Rg)-
The first formula is ruled out to avoid storing in memory any n-by-n matrices
in the algorithm. Our numerical tests show that the third expression, though
somewhat more expensive than the second one, often provides more accurate
results in the presence of round-off errors.
To summarize, Algorithms 2.1 and 2.2 use the cosine-based formulation
(2.1), (2.2) for large angles and the sine-based formulation of Theorem 2.2 for
small angles, which allows accurate computation of all angles. The algorithms
are reasonably efficient for large-scale applications with n > p and are more
robust than the original cosine-based only version.
In the rest of the section, we describe some useful properties of principal
angles not yet mentioned. In this investigation, we follow [33] and make use of
an orthogonal projectors technique [25]. For an alternative approach, popular
in matrix theory, which is based on representation of subspaces in a canonical
CS form, we refer to [51].
Theorem 2.2 characterizes singular values of the product (/ Pf)Qgi which
are the sine of the principal angles. What axe singular values of the matrix
PfQg? A trivial modification of the previous proof leads to the following not
really surprising result that these are the cosine of the principal angles.
Theorem 2.3 Singular values of the matrix QfQ^fQg a,re exactly the same as
Ok, defined in (2.1).
27


We conclude this section with other simple and known (e.g., [55, 51, 56]) sine
and cosine representations of principal angles and principal vectors, this time
using orthogonal projectors Pp and Pq on subspaces P and Q, respectively.
Theorem 2.4 Let assumptions of Theorem 2.2 be satisfied. Then oy >
> aq are the q largest singular values of the matrix PfPg; in particular,
a1 = \\PFPG\\.
Other n q singular values are all equal to zero.
Remark 2.6 As singular values of PpPa are the same as those of PgPf, sub-
spaces P and Q play symmetric roles in Theorem 2.4; thus, our assumption that
p = dimiF > dimt? = q is irrelevant here.
Theorem 2.5 Let assumptions of Theorem 2.2 be satisfied. Then pi < P2 <
< pq are the q largest singular values of the matrix (I Pf)Pq; in particular,
P., = \\(I-Pf)Pg\\.
Other n q singular values are all equal to zero.
Remark 2.7 Comparing Theorems 2.4 and 2.5 shows trivially that sine of prin-
cipal angles between P and Q are the same as cosine of principal angles between
P1- and Q because I Pp is an orthogonal projector on P^~. If p > n/2 > q,
it may be cheaper to compute principal angles between P1- and Q instead of
principal angles between P and Q.
28


What can we say about singular values of the matrix (/ Pg)Pf'7- In other
words, how do cosine of principal angles between subspaces PL and Q compare
to cosine of principal angles between their orthogonal complements P and (z1?
If p = q, they are absolutely the same; in particular, the minimal angle between
subspaces Px and Q is in this case the same as the minimal angle between
their orthogonal complements P and G, e.g., in [14], and, in fact, is equal to
gap(P, G) = 11 Pf Pg|| as we already discussed. When p > q, subspaces P
and QL must have a nontrivial intersection because the sum of their dimensions
is too big; thus, the minimal angle between subspaces P and QL must be zero
in this case, which corresponds to ||(I Pq)Pf\\ = 1, while \\(I Pf)Pg|| uiay
be less than one. To be more specific, dim(Jr n G1) > p q\ thus, at least
p q singular values of the matrix (/ Pg)Pf are equal to one. Then, we have
the following statement, which completely clarifies the issue of principal angles
between orthogonal complements; cf. Ex. 1.2.6 of [7].
Theorem 2.6 The set of singular values of (I Pg)Pf, when p > q, consists
of p q ones, q singular values of (/ Pf)Pg, and, n p zeros.
In particular, this shows that the smallest positive sine of principal angles
between T and Q, called the minimum gap, is the same as that between P1 and
G*; see [30]. This theorem can also be used to reduce the costs of computing
the principal angles between subspaces P and G, when their dimeiisions p and q
are greater than n/2, by replacing P and G with their orthogonal complements.
Let us finally mention a simple property of principal vectors, emphasized in
[55], which helps us to understand a geometric meaning of pairs of corresponding
29


principal vectors from different subspaces.
Theorem 2.7 We have
PFvk = ovitfc, PGuk = akvk, k=l,...,q,
and
ujvj (PFUi)TVj = uJPFVj = (Tjujiij = cTjSij, i,j = l,...,q.
In other words, a chosen pair uk,vk spans a subspace, invariant with respect to
orthoprojectors Pf and PG and orthogonal to all other such subspaces. The kth
principal angle 9k is simply the angle between uk and vk; see (2.9).
Moreover, the subspace span{u,fc, vk} is also invariant with respect to or-
thoprojectors I Pf and I PG. Let us define two other unit vectors in this
subspace:
uk = (vk ~ <7kUk)/(J,k e vk = (uk crkvk)/pk e Ql
such that ufuf = vfvjr = 0. Then
uk, vk are principal vectors for subspaces J- and Q;
uf, vk are principal vectors for subspaces P1 and Q;
uk,vk are principal vectors for subspaces T and QL;
uk, vk are principal vectors for subspaces P1- and Q1,
which concludes the description of all cases.
30


In the next section, we deal with an arbitrary scalar product.
2.4 Generalization to an ABased Scalar Product
Let A £ RnXn be a fixed symmetric positive definite matrix. Let (x, y)A =
(x, Ay) = yTAx be an A-based scalar product, x, y £ Rn. Let ||x||,4 = ->/(x,x)a
be the corresponding vector norm and let ||R||a be the corresponding induced
matrix norm of a matrix B £ Rnxn. We note that ||x||.a = ||A1/,2x|| and \\B\\a =
\\Al/2BA~V2\\.
In order to define principal angles based on this scalar product, we will follow
arguments of [5, 21] but in an A-based scalar product instead of the standard
Euclidean scalar product. Again, we will assume for simplicity of notation that
P>Q-
Principal angles
Oi,... ,6q £ [0,7t/2]
between subspaces T and Q in the A-based scalar product (, -)A are defined
recursively for k = 1,..., q by analogy with the previous definition for A = I as
cos(4) = maxu max, eg ()a = (Uk,vk)A (2.10)
subject to
IMU = ||w|U = l, (u,Ui)A = 0, (v,Vi)A = 0, i=l,...,k-l. (2.11)
The vectors u{\uq and Vi,... ,vq are called principal vectors relative to the
A-based scalar product.
The. following theorem justifies the consistency of the definition above and
provides a cosine-based algorithm for computing the principal angles in the
31


A-based scalar product. It is a direct generalization of the cosine-based ap-
proach of [5, 21].
Theorem 2.8 Let columns of QF Rnxp and QG G Rnxq now be A-
orthonormal bases for the subspaces F and G, respectively. Let cq > cr2 >
> aq be singular values of QFAQG with corresponding left and right singular
vectors yk and Zk, k = 1,..., q. Then the principal angles relative to the scalar
product (, -)^ as defined in (2.10) and (2.11) are computed as
9k = arccos (ak),. k = l,...,q, (2-12)
where
0 < 9, < < 9q <
while the principal vectors are given by
uk = QFVk, vk = QGzk, k = l,...,q.
Proof: We first rewrite definition (2.10) and (2.11) of principal angles in the
following equivalent form. For k = 1,..., q,
cos(9k) = max^ £RP max* GRq yTQ'FAQGz = ylQ^AQGzk
subject to
IMI = INI = 1, yTVi = 0, zTZi = 0, i = 1,. -, A: 1,
where u = QFy E F,v = QGz G Q and uk QFyk G T, vk = QGZk Q-
Since QF and QG have A-orthonormal columns, Q'FAQF = I and Q'qAQg =
I. This implies
\\u\\\ = yTQFAQFy = yTy = |M|2 = 1
32


and
IMfc = ztQtgAQgz = zTz = ||z||2 = 1.
For i ^ j, we derive
(ui,Uj)A = yfQpAQpyj = yfyj = 0
and
(vi,Vj)A = zf QGAQGZj = zfzj = 0.
Now, let the reduced SVD of QfpAQc be
YTQpAQGZ = diag(<7i, cr2,..., crq), (2.13)
where Y e RpXq, Z e Rqxq both have orthonormal columns.
Then, by Theorem 2.1 with M = QpAQG, the equality cos {Ok) = o k, k =
1,..., q, just provides two equivalent representations of the singular values of
Q'pAQa, and yz and Zk can be chosen as columns of matrices Y and Z, respec-
tively. The statement of the theorem follows. M
Let us now make a trivial but important observation that links principal an-
gles in the A-based scalar product with principal angles in the original standard
scalar product, when a factorization of A = KTK, e.g., K = A1//2, is available.
We formulate it as the following theorem.
Theorem 2.9 Let A = KTK. Under assumptions of Theorem 2.8 the principal
angles between subspaces T and Q relative to the scalar product (, -)a coincide
with the principal angles between subspaces KT and KQ relative to the original
scalar product (,)
33


Proof: One way to prove this is to notice that our definition of the principal
angles between subspaces T and Q relative to the scalar product (, -)^ turns into
a definition of the principal angles between subspaces KT and KQ relative to
the original scalar product (, ) if we make substitutions Ku f> u and Kv t-> v.
Another proof is to use the representation
QpAQa = (KQf)T KQg,
where columns of matrices KQp and KQg are orthonormal with respect to the
original scalar product (, ) and span subspaces KT and KQ, respectively. Now
Theorem 2.8 is equivalent to the traditional SVD theorem on cosine of principal
angles between subspaces KT and KQ relative to the original scalar product
(, ), formulated in the introduction.
The A-orthogonal projectors on subspaces T and Q are now defined by
formulas
Pf ; QfQ*f = QfQ^A and Pq QgQ*q 1 QgQgA,
where *a denotes the A-adjoint.
To obtain a sine-based formulation in the A-based scalar product that is
accurate for small angles, we first adjust (2.5) and (2.7) to the new A-based
scalar product:
ga,pA(T,Q) = \\PF Pg\\a
= max {U*|U=i IK7 pf)x\\a, IK7 Pg)v\\a} (2-14)
34


If p = q, this equation will yield sin(^), consistent with Theorem 2.8. Similar
to the previous case A = I, only the first term under the maximum is of interest
under our assumption that p> q. Using the fact that
IMU = \\Qgz\\a = INI Vz G g, X- Qgz, z e Rq,
the term of interest can be rewritten as
maxze^u^i ||(/ PF)x\\A = \\A1/2(I QfQfa)Qg\\- (2-15)
Here we use the standard induced Euclidean norm || || for computational pur-
poses. Similar to our arguments, in the previous section, we obtain a more
general formula for all principal angles in the following.
Theorem 2.10 Let S = (/ QfQ:fA)Qg Singular values pi < H2 < < pq
of matrix Al!2S are given by //& = y/l a|, k = 1,... ,q, where ak are defined
in (2.13). Moreover, the principal angles satisfy the equalities Ok = arcsin(^).
The right principal vectors can be computed as
Vk = QcZk, k = 1,..., q,
where Zk are corresponding orthonormal right singular vectors of matrix A1/2S.
The left principal vectors are then computed by
Uk = QFQTFAvkjcFk ifak^ 0, k = 1,..., q.
Proof: We first notice that squares of the singular values Uk of the matrix
Al/2S, which appear in (2.15), coincide with eigenvalues i/& = p2 of the product
35


STAS. Using the fact that Q'pAQp = I and QqAQg = I, we have
StAS = QTG{I-AQFQl)A{I-QFQTFA)QG
= I Q'qAQpQ^AQg-
Utilizing the SVD (2.13), we obtain QfpAQc = YYZT, where
S = diag ( ZtStASZ = I E2 = diag (1 a2, 1 a2,..., 1 a2q).
Thus, the eigenvalues of STAS are given by ^ = 1 k = 1,..., q, and the
formula for the principal angles follows directly from (2.12).
For computational reasons, when n is large, we need to avoid dealing with
the square root A1/2 explicitly. Also, A may not be available as a matrix but only
as a function performing the multiplication of A by a given vector. Fortunately,
the previous theorem can be trivially reformulated as follows to resolve this
issue.
Theorem 2.11 Eigenvalues Vi < < < uq of matrix STAS, where
S = (I QFQ'fA)Qg, are equal to Uk = 1 crl, k = 1,..., g, where ak
are defined in (2.13). Moreover, the principal angles satisfy the equalities
Ok = arcsin (ydT) k = 1,..., q. The right principal vectors can be computed as
Vk = QoZk, k = l,...,g,
where Zk are corresponding orthonormal right eigenvectors of matrix STAS. The
left principal vectors are then computed by
Uk = QFQFAvk/ak if 36


We can easily modify the previous proof to obtain the following analogue of
Theorem 2.3.
Theorem 2.12 Singular values of the matrix A1/1QpQrpAQo = JP^PpQg co-
incide with Ok, defined in (2.13).
It is also useful to represent principal angles using exclusively A-orthogonal
projectors Pp and Pq on subspaces T and Q, respectively, similarly to Theorems
2.4 and 2.5.
Theorem 2.13 Under assumptions of Theorem 2.8, cri > cr2 > > the q largest singular values of the matrix A1PPpPoA~lP; in particular,
\\PfPg\\a-
Other n q singular values are all equal to zero.
Proof: First, we rewrite
A^PpPaA-1'2 = A'pQpQlAQaQlAA-'P
= A1/2QF (A'PQpf A^Qc (A'KQb? .
As columns of matrices AlPQp and A1^Qg are orthonormal with respect to the
original scalar product (, ) bases of subspaces AlPp and AlPQ, respectively,
the last product is equal to the product of orthogonal (not A-orthogonal!) pro-
jectors P^-np and Pjpy 12q on subspaces Af^T and A1^1Q.
Second, we can now use Theorem 2.4 to state that cosine of principal angles
between subspaces All2Tr and AfPQ with respect to the original scalar product
37


(,) are given by the q largest singular values of the product Pa\iitPa\iiq
A^PpPcA-1/2.
Finally, we use Theorem 2.9 to conclude that these singular values are, in
fact, a-*,, k = 1,... ,q, i.e., the cosine of principal angles between subspaces T
and Q with respect to the A-based scalar product.
Theorem 2.14 Let assumptions of Theorem 2.11 be satisfied. Then pi < //2 <
< pbq are the q largest singular values of the matrix A1/2 (I Pp)PcA' lP; in
particular,
Pq=\\(I-PF)PG\\A.
The other n q singular values are all equal to zero.
Proof: We rewrite
Al/\I Pf)PgA~1/2 = (/ A1/2Qf (A1/2gF)T) A1/2Qg (A1/2Qg)T
= (I P^iiyf) PAl/lg
and then follow arguments similar to those of the previous proof, but now using
Theorem 2.5 instead of Theorem 2.4.
Remarks 2.6-2.7 and Theorems 2.6-2.7 for the case A = I hold in the general
case, too, with obvious modifications.
Our final theoretical results are perturbation theorems in the next section.
38


2.5 Perturbation of Principal Angles in the
ABased Scalar Product
In the present section, for simplicity, we always assume that matrices F, G
and their perturbations F, G have the same rank; thus, in particular, p = q.
We notice that F and G appear symmetrically in the definition of the prin-
cipal angles, under our assumption that they and their perturbations have the
same rank. This means that we do not have to analyze the perturbation of F
and G together at the same time. Instead, we first study only a perturbation in
G.
Before we start with an estimate for cosine, let us introduce a new notation
0 using an example:
(G + G)eQ = (G + G)ng,
where and the orthogonal complement to Q are understood in the A-based
scalar product.
Lemma 2.1 Let 0"i > cr2 > > cr9 and di > > aq be cosine of
principal angles between subspaces T, Q and J-, Q, respectively, computed in the
A-based scalar product. Then, for k = 1,..., q,
K-dfc| < max[cos(6>mill{(C/ + Q) 0 Q, J7}); (2.16)
cos (flminK^ + Q) 0 G, 7''})]SaPA(^>
where 0mjn is the smallest angle between corresponding subspaces, measured in
the A-based scalar product.
39


Proof: The proof is based on the following identity:
AiI2QfQtfAQg = A1/2QFQFAQGQGAQG + A1/2QFQFA(I QGQGA)QG,
(2.17)
which is a multidimensional analogue of the trigonometric formula for the cosine
of the sum of two angles. Now we use two classical theorems on perturbation of
singular values with respect to addition:
sk(T + S) and with respect to multiplication:
sfc(TS'T) where T and S are matrices of corresponding sizes. We first need to take T =
A1/2QfQtfAQgQtgAQg and S = Al!2QFQTFA{I QGQTGA)QG in (2.18) to get
+ \\A^QfQtfA(I QgQtgA)Qgii,
where the first equality follows from Theorem 2.12. In the second term in the
sum on the right, we need to estimate a product, similar to a product of three
orthoprojectors. We notice that column vectors of (I QGQGA)QG belong
to the subspace (Q + Q) Q. Let P^g+g)Qg be an A-orthogonal projector on
the subspace. Then the second term can be rewritten, also using the projector
QfQfA = PF, as
AWQfQtfA{I QgQtgA)Qg = AlPQFQTFAP{g+g)eg{I QGQTGA)QG
40


= (AWpFP^a)egA~W) A^(I QoQlA)Q6-,
therefore, it can be estimated as
WA^QpQlAil QgQtgA)Qs\\ <
P1/2-Pi'f(5+6)e^_1/2HP1/2(-f QoQlA)Qa||.
The first multiplier in the last product equals
\\A1^PpP(s+g)eg-'4-1/2|l = \\PpPmi)ee\\A = + S) 0 g, ^}),
similar to (2.15) and using Theoreni 2.13 for subspaces (Q + Q) Q and T\
while the second multiplier is gap A (£/, dimC? = dim£. To estimate the first term in the sum, we apply (2.19) with
T = A^QfQIAQg and ST = QTGAQG.
SkiA^QFQ^AQGQaAQc) < Sk(A1/2QFQFAQG)\\QGAQcl\
< S}.(A1/2QfQ'fAQg) = crfc,
simply because the second multiplier here is the cosine of an angle between Q
and Q in the A-based scalar product, which is, of course, bounded by one from
above. Thus, we just proved
dfc < min{(£ + Q) 9, ^r})gapj4(^, Q).
Changing places of QG and Qg, we obtain
o-fc < + cos(f9min{(£ + Q) G, -^DgapA(G, Q)
and come to the statement of the lemma.
41


Remark 2.8 Let us try to clarify the meaning of constants appearing in the
statement of Lemma 2.1. Let us consider, e.g., + G) 9 G, IF}).
The cosine takes its maximal value, one, when at least one direction of the
perturbation of Q is A-orthogonal to Q and parallel to F at the same time. It
is small, on the contrary, when a part of the perturbation, A-orthogonal to Q, is
also A-orthogonal to IF. As (Q + Q) Q C QL, we have
cos+ G)QG, IF}) < cos {G1, IF}) (2.20)
= sin IF})
= gap a(G,IF),
which is the constant of the asymptotic perturbation estimate of [5] (where A =
I). The latter constant is small if subspaces Q and F are close to each other,
which can be considered more as a fortunate cancellation, as in this case cosine
of all principal angles is almost one, and a perturbation estimate for the cosine
does not help much because of the cancellation effect.
Remark 2.9 A natural approach similar to that of [23] with A = I involves a
simpler identity:
Q'fAQq = Q'pAQo + Q'pA{Qq Qg)->
where a norm df the second term is then estimated. Then (2.18) gives an esti-
mate of singular values using \\A1^{Qq Qg)\\- As singular values are invariant
with respect to particular choices of matrices Qq and Qa with A-orthonormal
columns, as far as they provide ranges G and G, respectively, we can choose them
42


to minimize the norm of the difference, which gives
tafPI/2Wa-Q,5<3)ll, (2.21)

where Q is an arbitrary q-by-q orthogonal matrix. This quantity appears in [23]
with A = I as a special type of the Procrustes problem. In [23], it is estimated
in terms of the gap between subspaces Q and Q (using an extra assumption that
2q < n). Repeating similar arguments, we derive
Wk~ffk\ < mi\\Al/2{QG-Q6Q)\\A < \/2gapA(0,6O, k = l,...,q. (2.22)
Lemma 2.1 furnishes estimates of the-perturbation of singular values in terms
of the gap directly, which gives a much better constant, consistent with that of
the asymptotic estimate of [5] for A = I; see the previous remark.
Now we prove a separate estimate for sine.
Lemma 2.2 Let Hi < < < nq and < fa < < p,q be sine of
principal angles between subspaces T, Q, and T, Q, respectively, computed in
the A-based scalar product. Then, for k = 1,..., q,
\Hk~h\ < max[sin(0xnax{(^ + £)(?> -T7});
shi(0max{((5 + Q) Q, ^"})]gaPJ4(0,Q), (2.23)
where 0max the largest angle between corresponding subspaces, measured in
the A-based scalar product.
Proof: The proof is based on the following identity:
A1/2{I QfQfA)Qg = A1/2(I QfQtfA)QgQtgAQg
+ AX>\I QpQpA)(/ QgQtgA)Qg,
43


which is a multidimensional analogue of the trigonometric formula for the sine
of the sum of two angles. The rest of the proof is similar to that of Lemma 2.1.
We first need to take T = A1/2(/ QfQ'pA)QgQ'gAQq and S =
A1/2(J QpQ'pA){I QgQgA)Qg and use (2.18) to get
- QfQtfA)Q6) < sk(A^(I QFQTFA)QaQlAQe)
+ \\AV\I QfQtfA)(I QaQTaA)Qs||.
In the second term in the sum on the right, QpQpA = Pp and we deduce
A1/2(I QFQTFA)(I QgQga)Qg = aU2(I Ww-hW7 QoQaA)Qa
= (AW(I P)P(g+i)egA-1/2) (AV\I QaQlA)Q6) ,
using notation P(g+g)eg for the A-orthogonal projector on the subspace (Q +
Q) Q, introduced in the proof of Lemma 2.1. Therefore, the second term can
be estimated as
\\A1I2(I QfQfA)(I QgQga)Qg\\
< \\AW(I Pp)P^Q+Q-jegA~l^\\\\Al^{I QgQTgA)Qg||.
The first multiplier is
\\All2{i-pF)p{gF6)asA-ll2\\ = ||(/-Pp)P(s+5)eg|U = sm(emax{(e+e)e5, p})
by Theorem 2.14 as dimJ7 > dim((t/ + Q) Q), while the second multiplier is
simply gapj4(S, Q) because of our assumption dimty = dimfT
44


To estimate the first term in the sum, we take with T = A1!2(IQFQrFA)QG
and ST = QqAQq and apply (2.19):
st(A1/2(I QFQ?A)QoQ^AQs) < 8^(1 -QFQTFA)Qa)\\QTGAQo\\
< sk(Al*(I QfQtfA)Qc),
using exactly the same arguments as in the proof of Lemma 2.1.
Thus, we have proved
/he 5: fJ'k + sin(0max{(f? + Q) Q, 7r})gaP&)
Changing the places of Qq and QG, we get
Mfc < Afc + sin(0max{(f? + Q) Q, ?"})gaPa(^> 60-
The statement of the lemma follows.
Remark 2.10 Let us also highlight that simpler estimates,
\iuk-fik\ which are not as sharp as those we prove in Lemmas 2.1 and 2.2, can be derived
almost trivially using orthoprojectors (see [55, 53, 22]), where this approach is
used for the case A I. Indeed, we start with identities
A1/2PfPqA~1/2 = A1/2PFPGA~1/2 + (A1/2PFA~l/2) (A1/2{Pa PG)A~1/2)
for the cosine and
A1/2(I Pf)PqA~1/2 = Afl2{I Pf)PgA-1/2
+ (A1/2(/ Pf)A~1/2) (A1/2(Pd Pg)A~1/2)
45


for the sine, and use (2.18) and Theorems 2.13 and 2.14. A norm of the second
term is then estimated from above by gapusing the fact that for an
A-orthoprojector Pp we have ||Pp||a = ||7 Pf|]a = 1-
Instead of the latter, we can use a bit more sophisticated approach, as in [22],
if we introduce the A-orthogonal projector Pg+§ on the subspace Q + Q. Then
the norm of second term is bounded by gapA{Q,Q) times \\PpPg+g\\A for the
cosine and times ||(7 Pf)Pq+q\\a for the sine, where we can now use Theorem
2.13 to provide a geometric interpretation of these two constants. This leads to
estimates similar to those of [22] for A= I:
Wk <3-fc| < cos(0min{.F, Q + £})gapA(£, G), k = l,...,q,
and
W ~ fo\ < cos(0min{J'-L, g + £})gapA(£, g), k = l,...,q.
However, the apparent constant improvement in the second estimate, for the
sine, is truly misleading as
cos^f.^, g + g}) = i
simply because dimJ7 < dim (5 + g) in all cases except for the trivial possibility
g = g, so subspaces J-x and g + Q must have a nontrivial intersection.
The first estimate, for the cosine, does give a better constant (compare to
one), but our constant is sharper; e.g.,
COS^Tninl^ (g + g)Q G}) < COs(0min{.F, £ + £})
46


Our more complex identities used to derive perturbation bounds provide an
extra projector in the error term, which allows us to obtain more refined con-
stants.
We can now establish an estimate of absolute sensitivity of cosine and sine
of principal angles with respect to absolute perturbations of subspaces.
Theorem 2.15 Let > a2 > > aq and cb > 02 > > aq be cosine of
principal angles between subspaces F, Q, and F, Q, respectively, computed in
the A-based scalar product. Then
Wk ~&k\< cigapA(F,F) + c2gap^(£,Q), k=l,...,q, (2.24)
where
ci = max{cos(6>min{(£ + Q) 0 G, F})\ cos(0min{(£ + Q) Q, F})},
c2 = max{cos((9miri{(^r + F) 0 F, Q}); cos(0min{(.F + F) F, Q})},
where 0min is the smallest angle between corresponding subspaces in the A-based
scalar product.
Proof: First, by Lemma 2.1, for k = 1,..., q,
kfc-dfcl < max{cos(#mjn{((? + G) QQ, F})\
cos (^min{(^ + Q) 0, F})}g&pA{Q, Q).
47


Second, we apply a similar statement to cosine of principal angles between sub-
spaces J~, Q and T, Q, respectively, computed in the A-based scalar product:
I dfc | < max{cos(0min{(Jr + f) T, G})\
cos^min^ +
The statement of the theorem now follows from the triangle inequality.
Theorem 2.16 Let pi < ^2 < < p,q and fi\ < ft2 < < ftq be sine of
principal angles between subspaces T, G, and J~, Q, respectively, computed in
the A-based scalar product. Then
\iak-ftk\ where
C3 = max{sin(#max{((5 + G) G, -T7}); sin(^max{(0 + G) G, -T7})},
c4 = max{sin(0rnax{(^r + £) 0 F, G})\ sin^maxiO?7 + F) f, £})},
where 0max is the largest angle between corresponding subspaces in the A-based
scalar product.
Proof: First, by Lemma 2.2, for k = 1,..., q,
K-/hi| < max{sin(0max{(£? + G) QG, F})\
sin(^max{(^ + G) 0 G, ?7})}gaP60-
48


Second, we apply a similar statement to sine of principal angles between sub-
spaces T, Q and T, Q, respectively, computed in the A-based scalar product:
|Afc ~h\ < max{sin(6>max{(^:' + T) 0 T, Q})\
sin^maxIC?7 + f) f, ^})}gapA(^, f).
The statement of the theorem now follows from the triangle inequality.
Finally, we want a perturbation analysis in terms of matrices F and G that
generate subspaces T and Q. For that, we have to estimate the sensitivity of a
column-space of a matrix, for example, matrix G.
Lemma 2.3 Let
ka{G) =
^max(^1/,2 {A1/2G)
3 mm
denote the corresponding A-based condition number ofG, where smax and smjn
are, respectively, largest and smallest singular values of the matrix Al^2G. Let
Q and Q be column-spaces of matrices G and G, respectively. Then
gapA(S, 5) < ^(G)11^1^^11. (2.26)
Proof: Here, we essentially just adopt the corresponding proof of [55] for the
A-based scalar product using the same approach as in Theorem 2.9.
Let us consider the polar decompositions
Al/2G = A1/2QGTG and A1/2G = A^Q^Tq,
where matrices Al/2QG and A1/2Qq have orthonormal columns and matrices TG
and Tq are g-by-g symmetric positive definite; e.g., TG = (QoQc)1^2- Singular
49


values of Tq and Tq are, therefore, the same as singular values of A^2G and
Al^2G, respectively. Then,
(I P6){G Gf) = (I P6)QqTg.
Therefore,
A1/2(I Pq)Qg = (A1/2(I Pg)A~1/2) A1/2{G G)Ta\
and
gapA(5,S) M1/2(G-G) ||
as HA1/2^ Pq)A l/2\\ = || J Pq\\a < 1- The statement of the lemma follows.
column scaling, which trivially does not change the column range. Our simple
Lemma 2.3 does not capture this property. A more sophisticated variant can be
easily obtained using a technique developed in [23, 22],
Our cosine theorem follows next. It generalizes results of [53, 22, 23] to
A-based scalar products and somewhat improves the constant.
Theorem 2.17 Under assumptions of Theorem 2.15,
Remark 2.11 Some matrices allow improvement of their condition numbers by
Wk ~0k\< CiKA(F)
\\A^F\\
+ C2^a(G)
\\A^{G-&)\\
life'll
(2.27)
50


The theorem above does not provide an accurate estimate for small angles.
To fill the gap, we suggest the following perturbation theorem in terms of sine
of principal angles; cf. [53, 22] for A = I.
Theorem 2.18 Under assumptions of Theorem 2.16,
[l^1/2(G-g)ll , ,
\\AWGW
(2.28)
Finally, let us underscore that all sensitivity results in this chapter are for
absolute errors. Golub and Zha in [22] observe that relative errors of sine and
cosine of principal angles are not, in general, bounded by the perturbation.
A-based scalar product in the next section.
2.6 Algorithm Implementation
In this section, we provide a detailed description of our code SUBSPACEA.m
and discuss the algorithm implementation.
Theorem 2.11 is our main theoretical foundation for a sine-based algorithm
for computing principal angles with respect to an A-based scalar product. How-
ever, a naive implementation, using the SVD of the matrix 5TA5r, may not
produce small angles accurately in computer arithmetic. We now try to explain
informally this fact, which is actually observed in numerical tests.
Let, for simplicity of notation, all principal angles be small.
Let us consider a particular case, where columns of matrices Qf and Qg
are already principal vectors in exact arithmetic. In reality, this is the situation
We consider algorithms of computing principal angles with respect to an
51


we will face in Algorithm 2.4. Then, in exact arithmetic, columns of S are A-
orthogonal and their A-norms are exactly the sine of principal angles. Thus, if
there are several small angles different in orders of magnitude, the columns of
S are badly scaled. When we take the norms squared, by explicitly computing
the product STAS, we make the scaling even worse, as the diagonal entries of
this diagonal matrix are now the sine of the principal angles squared, in exact
arithmetic. In the presence of round-off errors, the matrix STAS is usually not
diagonal; thus, principal angles smaller than 10~8 will lead to an underflow effect
in double precision, which cannot be cured by taking square roots of its singular
values later in the algorithm.
To resolve this, we want to be able to compute the SVD of the matrix STAS
without using either STAS itself or A1/2S. One possibility is suggested in the
following lemma.
Lemma 2.4 The SVD of the matrix A1/2S coincides with the SVD of the matrix
Q'gAS, where Qs is a matrix with A-orthonormal columns, which span the same
column-space as columns of matrix S.
Proof: We have
(QtsAS)t QtsAS = StAQsQtsAS = StAPsS = stas,
where Ps = QsQsA is the A-orthogonal projector on the column-space of Qs,
which is the same, by definition, as the column-space of S, so that PsS = S.
52


This contributes to the accuracy of our next Algorithm 2.3, based on Lemma
2.4, to be more reliable in the presence of round-off errors, when several principal
angles are small.
By analogy with Algorithms 2.1 and (2.2), we can remove the restriction
that matrix E is invertible and somewhat improve the costs in Algorithm 2.3 by
reducing the size of the matrix S in step 4, which leads to Algorithm 2.4.
53


Algorithm 2.3: SUBSPACEA.m
Input: real matrices F and G with the same number of rows, and a symmetric
positive definite matrix A for the scalar product, or a device to compute Ax
for a given vector x.
1. Compute A-orthonormal bases Qp = ortha(F), Qg = ortha(G) of
column-spaces of F and G.
2. Compute SVD for cosine [Y, E, Z] = svd(Q^ A Qg), E = diag (o\,..., oq).
3. Compute matrices of left Ucos = QfY and right Yos = QqZ principal
vectors.
4.
5.
6.
7.
8.
9.
Compute the matrix S = { Q A diag ^ d diag >
[ Qf Qg(QgAQf) otherwise.
Compute A-orthonormal basis Qs = ortha(S) of the column-space of S.
Compute SVD for sine: [Y, diag (pi,..., pq), Z] = svd(QgAS').
Compute matrices Usin and Yin of left and right principal vectors:
Yin = QgZ, Usin = QF(Q^AVsm)E-1 diag (QF) > diag (Qg)]
f/sin = QfZ, Yin = Qg(Q'gAUsm)S ^ otherwise.
Compute the principal angles, for k = 1,..., q:
8 k =
arccos((7fc) if cr\ < 1/2,
arcsin(/ifc) if p% < 1/2.
Form matrices U and V by picking up corresponding columns of Us-m, Yin
and Ucos, Yost according to the choice for 8^ above.
Output: Principal angles 91,.. .,9q between column-spaces of matrices F and G
in the A-based scalar product, and corresponding matrices of left, U, and
right, V, principal vectors.
Previous remarks of section 2.3 for the algorithms with A = I are applicable
to the present algorithms with self-evident changes. A few additional A-specific
remarks follow.
Remark 2.12 In step 1 of Algorithms 2.3 and 2.4, we use our SVD-based
54


function ORTHA.m for the A-orthogonalization. Specifically, computing an
A-orthogonal basis Q of the column-space of a matrix X is done in three
steps in ORTHA.m. First, we orthonormalize X, using SVD-based built-
in MATLAB code ORTH.m with a preceding explicit column scaling; see Re-
mark 2.1 on whether the scaling is actually needed. Second, we compute
[U, S, V] = svd(XTAX), using MATLABs built-in SVD code. Finally, we take
Q = XUS-1/2. If A is ill-conditioned, an extra cycle may be performed to im-
prove the accuracy. While formal stability and accuracy analysis of this method
has not been done, ORTHA.m demonstrates practical robustness in our numer-
ical tests.
55


Algorithm 2.4: Modified and Improved SUB-
SPACEA.m
Input: real matrices F and G with the same number of rows, and a symmetric
positive definite matrix A for the scalar product, or a device to compute
Ax for a given vector x.
1. Compute A-orthogonal bases Qp = ortha(A), Qq ortha(G) of
column-spaces of F and G.
2. Compute SVD for cosine [Y, S, Z\ = svd(Q^ AQg), £ = diag [u\,... ,crq).
3. Compute matrices of left Ucos = QpY and right V^0s = QgZ principal
vectors.
4. Compute large principal angles for k = 1,... ,q:
9k arccos(ofc) if u\ < 1/2.
5. Form parts of matrices U and V by picking up corresponding columns
according to the choice for 6k above. Put columns ofUcos, V^0S) which are
left, of Ucos, YC0S) in matrices Rp and Rg Collect the corresponding os
in a diagonal matrix Hr.
6. Compute the matrix S = Rq Qf{Q'rARg).
7. Compute A-orthogonal basis Qs = ortha(5) of the column-space of S.
8. Compute SVD for sine: [Y, diag (m,..., pq), Z] = svd(Q^AS).
9. Compute matrices Us-m and V^jn of left and right principal vectors:
Yiin RgZ, Usjn = Rp^RpAVsir^Hj^.
10. Compute the missing principal angles, for k l,..., q:
9k = arcsin(/ifc) if pi < 1/2.
11. Complete matrices U and V by adding columns of Us[n, T^in-
Output: Principal angles 9\,... ,9q between column-spaces of matrices F and
G, and corresponding matrices U and V of left and right principal vectors,
respectively.
56


Remark 2.13 We note that, when n p, the computational costs of SVDs
of p-by-p matrices are negligible; it is multiplication by A, which may be very
computationally expensive. Therefore, we want to minimize the number of mul-
tiplications by A. In the present version 4.0 of our code SUBSPACEA.m, based
on Algorithm 2.4, we multiply matrix A by a vector 2p+q times in the worst-case
scenario of all angles being small, in steps 1 and 7. We can avoid multiplying
by A on steps 2, 8, and 9 by using appropriate linear combinations of earlier
computed vectors instead.
Remark 2.14 Our actual code is written for a more general complex case,
where we require matrix A to be Hermitian.
Let us finally underline that in a situation when a matrix K from the fac-
torization A = KtK is given rather than the matrix A itself, we do not advise
using Algorithms 2.3 and 2.4. Instead, we recommend multiplying matrices F
and G by K on the right and using simpler Algorithms 2.1 and 2.2, according
to Theorem 2.9.
2.7 Numerical Examples
Numerical tests in the present section were performed using version 4.0 of
our SUBSPACEA.m code, based on Algorithms 2.2 and 2.4. Numerical results
presented were obtained, unless indicated otherwise, on Red Hat 6.1 LINUX
Dual Pentium-Ill 500, running MATLAB release 12, version 6.0.0.18. Tests were
also made on Compaq Dual Alpha server DS 20, running MATLAB release 12,
version 6.0.0.18 and on several Microsoft Windows Pentium-Ill systems, running
57


MATLAB version. 5.1-5.3. In our experience, Intel Pill-based LINUX systems
typically provided more accurate results, apparently utilizing the extended 80
bit precision of FPU registers of PHI; see the discussion at the end of the section.
The main goal of our numerical tests is to check a practical robustness of
our code, using the following argument. According to our analysis of section
2.5 and similar results of [5, 53, 22, 23], an absolute change in cosine and sine
of principal angles is bounded by perturbations in matrices F and G, with the
constant, proportional to their condition numbers taken after a proper column
scaling; see Theorems 2.17 and 2.28 and Remark 2.11. Assuming a perturbation
of entries of matrices F and G at the level of double precision, EPS 10-16,
we expect a similar perturbation in cosine and sine of principal angles, when
matrices F and G are well-conditioned after column scaling. We want to check
if our code achieves this accuracy in practice.
We concentrate here on testing our sine-based algorithms, i.e., for principal
angles smaller than 7r/4. The cosine-based algorithm with A I is recently
studied in [16].
Our first example is taken from [5] with p = 13 and m = 26. Matrices F
and G were called A and B in [5]. F was orthogonal, while G was an m-by-p
Vandermonde matrix: with cond(G) ~ 104. Matrix G was generated in double
precision and then rounded to single precision.
According to our theory and a perturbation analysis of [5, 53, 22, 23], in this
example an absolute change in principal angles is bounded by a perturbation in
matrix G times its condition number. Thus, we should expect sine and cosine of
58


principal angles computed in [5] to be accurate with approximately four decimal
digits.
In our code, all computations are performed in double precision; therefore,
answers in Table 2.2 should be accurate up to twelve decimal digits. We observe,
as expected, that our results are consistent with those of [5] within four digits.
k sin(0fc) cos(0*)
1 0.00000000000 1.00000000000
2 0.05942261363 0.99823291519
3 0.06089682091 0.99814406635
4 0.13875176720 0.99032719194
5 0.14184708183 0.98988858230
6 0.21569434797 0.97646093022
7 0.27005046021 0.96284617096
8 0.33704307148 0.94148922881
9 0.39753678833 0.91758623677
10 0.49280942462 0.87013727135
11 0.64562133627 0.76365770483
12 0.99815068733 0.06078820101
13 0.99987854229 0.01558527040
Table 2.2: Computed sine and cosine of principal angles of the example of [5]
In our next series of tests, we assume n to be even and p = q < n/2. Let D
be a diagonal matrix of the size p:
D = diag (<2i,..., dp), dk> 0, k = l,...,p.
We first define n-by-p matrices
F, = [I 0]T, G1 = [ID 0]T, (2.29)
59


where I is the identity matrix of the size p and 0 are zero matrices of appropriate
sizes. We notice that condition numbers of F\ and G\ are, respectively, one and
condGi =
1 + (max{diag (D)})2
1 + (min{diag (.D)})2
Thus, the condition number may be large only when large diagonal entries in D
are present. Yet in this case, the condition number of G\ can be significantly
reduced by column scaling; see Remark 2.11.
The exact values of sine and cosine of principal angles between column-
spaces of matrices F\ and G\ are obviously given by
Pk =
dk
V1 + dl
V1 + di
k = 1,...,P,
(2.30)
respectively, assuming that dk s are sorted in the increasing order. The collective
error in principal angles is measured as the following sum:
\J(l11 ~ Ai)2 + + P'p)2 + \j{&i &\)2 ---------+ (o'p dp)2, (2.31)
where ps are the sine and as are the cosine of principal angles, and the tilde
sign' is used for actual computed values.
We multiply matrices Fi and G\ by a random orthogonal matrix U of the
size n on the left to get
F2 = U* Fi, G2 = U*G i. (2.32)
This transformation does not change angles and condition numbers. It still
allows for improving the condition number of G2 by column scaling.
60


Finally, we multiply matrices by random orthogonal p-by-p matrices Tp and
Tq, respectively, on the right:
Fz = F2 Tp, G$ = G2* Tq. (2.33)
This transformation does not change angles or condition numbers. It is likely,
however, to remove the possibility of improving the condition number of Gz by
column scaling. Thus, if Gz is ill-conditioned, we could expect a loss of accuracy;
see Theorems 2.17 and 2.28 and Remark 2.11.
We start by checking scalability of the code for well-conditioned cases. We
increase the size of the problem n and plot the collective error, given by 2.31,
for the principal angles between Fz and Gz, against the value n/2. We solve the
same problem two times, using Algorithm 2.2 and Algorithm 2.4 with A I.
In the first two tests, diagonal entries of D are chosen as uniformly dis-
tributed random numbers rand on the interval (0,1). On Figure 2.1 (top) we
fix p = q = 20. We observe that the average error grows approximately two
times with a ten times increase in the problem size. On Figure 2.1 (bottom),
we also raise p = q = n/2. This time, the error grows with the same pace as the
problem size. Please note different scales used for errors on Figure 2.1.
61


x 10
Error Growth with Problem Size
Q> 4
1
0 0 Alg. 6.2, A=l
* * Alg. 3.2
A *
l V *%6o
* ^ o
Oo
:0 0
50 100 150 200 250 300 350 400 450 500
problem size: n/2
x10-u Error Growth with Problem Size
problem size: n/2
Figure 2.1: Errors in principal angles as functions of n/2
62


To test our methods for very small angles with p = q = 20, we first choose
p = 20, dk = 10_fc, k=l,...,p.
We observe a similar pattern of the absolute error as that of Figure 2.1 (top),
but do not reproduce this figure here.
In our second test for small angles, we set p = q = n/2 as on Figure 2.1
(bottom) and select every diagonal element of D in the form 10-17'rarid, where
rand is again a uniformly distributed random number on the interval (0,1). The
corresponding figure, not shown here, looks the same as Figure 2.1 (bottom),
except that the error grows a bit faster and reaches the level m 4-10-14 (compare
to 3 10-14 value on Figure 2.1 (bottom).
In all these and our other analogous tests, Algorithm 2.2 and Algorithm 2.4
with A I behave very similarly, so that Figure 2.1 provides a typical example.
In our next series of experiments, we fix a small p = q and n = 100, and
compute angles between F2 and G2 and between Fz and Gz 500 times, changing
only the random matrices used in the construction of our Fi and G{. Instead of
the collective error, given by 2.31, we now compute errors for individual principal
angles as
W -h\ + Wk cr*s|, k = i,...,p.
We tested several different combinations of angles less than 7r/4. In most cases,
the error was only insignificantly different from EPS ~ 10-16. The worst-case
scenario found numerically corresponds to
D = diag {1,0.5,10u, 1012,1013,5 1015,2 10~15,10~15,10~16,0}
63


and is presented on Figure 2.2. Figure 2.2 (top) shows the distribution of the
error for individual angles between F3 and G3 in Algorithm 2.2 in 500 runs.
Figure 2.2 (bottom) demonstrates the performance of Algorithm 2.4 with A =
I, for the same problem. The numeration of angles is reversed for technical
reasons; i.e., smaller angles are further away from the viewer. Also the computed
distribution of the error for individual angles between F2 and G2 are very similar
and, because of that, are not shown here.
We detect that for such small values of p and n most of the angles are
computed essentially within the double precision accuracy. Only multiple small
angles present a slight challenge, more noticeable on Figure 2.2 (bottom), which
uses a somewhat different scale for the error to accommodate a larger error.
Nevertheless, all observed errors in all 500 runs are bounded from above by
6 10-15 on Figure 2.2, which seems to be a reasonable level of accuracy
for accumulation of round-off errors appearing in computations with 200-by-10
matrices in double precision.
64


Error Distribution for Different Angles
x 10
error
angle number reversed
Error Distribution for Different Angles
X10
error
3 1
angle number reversed
Figure 2.2: Errors in individual angles in algorithms 2.2 and 2.4 with A = I
65


To test our code for an ill-conditioned case, we add two large values to the
previous choice of D to obtain
D = diag {1010,10s, 1,0.5,KT11, 1CT12,1(T13,5 1(T15,2 1CT15, 1(T15,1CT16,0},
which leads to condGi ~ 1010.
Figure 2.3 (top) shows the distribution of the error for individual angles
between F2 and G2 in Algorithm 2.4 with A = I after 500 runs. The numer-
ation of angles is again reversed; i.e., smaller angles are further away from the
viewer. There is no visible difference between the new Figure 2.3 (top) and
the previous Figure 2.2 for the well-conditioned case, which confirms results of
[23, 22] on column scaling of ill-conditioned matrices; see Remark 2.11. Namely,
our ill-conditioned matrix G2 can be made well-conditioned by column scaling.
Thus, perturbations in the angles should be small. Figure 2.3 (top) therefore
demonstrates that our code SUBSPACEA.m is able to take advantage of this.
66


number of hits number of hits
Error Distribution for Different Angles
error
angle number reversed
Error Distribution for Different Angles
Figure 2.3: Errors in individual angles between F2 and G2 and
3 and Gz
67


As we move to computing angles between A3 and G3 in this example (see
Figure 2.3 (bottom)), where the distribution of the error for individual angles
in Algorithm 2.4 with A I after 50.0 runs is shown, the situation changes
dramatically. In general, it is not. possible to improve the condition number
cond(G3) 1010 by column scaling. : Thus, according to [5, 23] and our pertur-
bation analysis of Theorems 2,17 and 2.28 and Remark 2.11, we should expect
the absolute errors in angles to grow ~ 1010 times (compare to the errors on
Figure 2.3 (bottom)), i.e., up to the ievel 10~5. On Figure 2.3 (bottom), which
shows actual errors obtained during 500 runs of the code, we indeed observe
absolute errors in angles at the level up to 105, as just predicted. Surprisingly,
the absolute error for angles with small cosines is much better. We do not have
an explanation of such good behavior, as the present theory does not provide
individual perturbation bounds; for different angles.
Our concluding numerical results illustrate performance of our code for ill-
conditioned scalar products.
We take G to be the first ten columns of the identity matrix of size twenty,
and F to be the last ten columns of the Vandermonde matrix of size twenty with
elements Vij i2c>~J, i, j = .!,;..., 20. Matrix F'is ill-conditioned, condA 1013.
We compute principal angles and vectors between F and G in an A-based scalar
product for the following family of matrices:
A = Ai = 10~lI + H, 1 = 1,..., 16,
where / is identity and H is the Hilbert matrix of the order twenty, whose
elements are given by hitj = l/(i+j l), i,j = 1,...,20. Our subspaces T and
68


Q do not change with /; only the scalar product that describes the geometry of
the space changes. When £ = 1, we observe three angles with cosine less than
10-3 and three angles with sine less than 10-3. When £ increases, we are getting
closer to the Hilbert matrix, which emphasizes first rows in matrices F and G,
effectively ignoring last rows. By construction of F, its large elements, which
make subspace T to be further away from subspace G, are all in last rows.
Thus, we should expect large principal angles between T and Q to decrease
monotonically as £ grows. We observe this in our numerical tests (see Figure
2.4 (top)), which plots in logarithmic scale sine of all ten principal angles as
functions of £.
69


Error vs. Condition Number
10
* . MS Windows
O 0 LINUX
* *
10' -
10
10
10
10 LLi
10 10
condition number
10
Figure 2.4: /j,k as functions of l and errors as functions of condition number
70


Of course, such change in geometry that makes sine of an angle to decrease
104 times, means that matrix Ai, describing the scalar product, gets more and
more ill-conditioned, as it gets closer to Hilbert matrix H, namely, cond(A) rj 10i
in our case. It is known that ill-conditioned problems usually lead to a significant
increase of the resulting error, as ill-conditioning amplifies round-off errors. To
investigate this effect for our code, we introduce the error as the following sum:
error = \\VTAV J|| + ||UTAU I\\ + ||£ UTAV\\,
where the first two terms control orthogonality of principal vectors and the last
term measures the accuracy of cosine of principal angles. We observe in our
experiments that different terms in the sum are close to each other, and none
dominates. The accuracy of sine of principal angles is not crucial in this example
as angles are not small enough to cause concerns. As U and V are constructed
directly from columns of F and G, they span the same subspaces with high
accuracy independently of condition number of A, as we observe in the tests.
We plot the error on the y-axis of Figure 2.4 (bottom) for a Pentium III 500
PC running two different operating systems: Microsoft Windows NT 4.0 SP6
(red stars) and RedHat LINUX 6.1 (blue diamonds), where the x-axis presents
condition number of A; both axes are in logarithmic scale. The MATLAB
Version 5.3.1.29215a (Rll.l) is the same on both operating systems.
We see, as expected, that the error grows, apparently linearly, with the
condition number. We also observe, now with quite a surprise, that the error
on LINUX is much smaller than the error on MS Windows!
71


As the same MATLABs version and the same code SUBSPACEA.m are
run on the same hardware, this fact deserves an explanation. As a result of a
discussion with Nabeel and Lawrence Kirby at the News Group sci.math.num-
analysis, it has been found that MATLAB was apparently compiled on LINUX
to take advantage of extended 80 bit precision of FPU registers of Pill, while Mi-
crosoft compilers apparently set the FPU to 64 bit operations. To demonstrate
this, Nabeel suggested the following elegant example: compute scalar product
(1 10'19 -1)T (1 1 1).
On MS Windows, the result is zero, as it should be in double precision, while
on LINUX the result is 1.084210-19.
Figure 2.4 (bottom) shows that our algorithm turns this difference into a
significant error improvement for an ill-conditioned problem.
Finally, our code SUBSPACEA.m has been used in the code LOBPCG.m
(see [34, 35]) to control accuracy of invariant subspaces of large symmetric gen-
eralized eigenvalue problems, and thus has been tested for a variety of large-scale
practical problems.
2.8 Availability of the Software
A listing for SUBSPACEA.m and the function ORTHA.m is provided in
Appendix A. Also see http://www.mathworks.com/support/ftp/linalgv5.shtml
for the code SUBSPACEA.m and the function ORTHA.m it uses, as well
as the fix for the MATLAB code SUBSPACE.m, as submitted to Math-
Works. They are also publicly available at the Web page: http://www-math.
cudenver. edu/~aknyaze v / software.
72


2.9 Conclusions Concerning Angles Between
Subspaces
Let us formulate here the main points of this section:
A bug in the cosine-based algorithm for computing principal angles be-
tween subspaces, which prevents one from computing small angles accu-
rately in computer arithmetic, is illustrated.
An algorithm is presented that computes all principal angles accurately
in computer arithmetic and is proved to be equivalent to the standard
algorithm in exact arithmetic.
A generalization of the algorithm to an arbitrary scalar product given by a
symmetric positive definite matrix is suggested and justified theoretically.
Perturbation estimates for absolute errors in cosine and sine of principal
angles, with improved constants and for an arbitrary scalar product, are
derived.
A description of the code is. given as well as results of numerical tests. The
code is robust in practice and provides accurate angles for large-scale and
ill-conditioned cases we tested numerically. It is also reasonably efficient
for large-scale applications with n~^> p.
Our algorithms are matrix-free; i.e., they do not require storing in mem-
ory any matrices of the size n and are capable of dealing with A, which
may be specified only as a function that multiplies A by a given vector.
73


MATLAB release 13 has implemented the SVD-based sine version of our
algorithm.
74


3. Rayleigh Quotient Perturbations and
Eigenvalue Accuracy
In this chapter we discuss Rayleigh quotient and Raleigh Ritz perturbations
and eigenvalue accuracy. There are two reasons for studying this problem [52]:
first, the results can be used in the design and analysis of algorithms, and
second, a knowledge of the sensitivity of an eigenpair is of both theoretical and of
practical interest. Rayleigh quotient and Raleigh Ritz equalities and inequalities
are central to determining eigenvalue accuracy and in analyzing many situations
when individual vectors and/or subspaces are perturbed.
3.1 Introduction to Rayleigh Quotient
Perturbations
Let A e RnXn be a matrix and r be a vector in Rn. In this chapter, for the
majority of results, we will assume that the matrix A is symmetric and that we
are dealing with finite dimensional real vector spaces. Let Ai < < An be the
eigenvalues of A with corresponding eigenvectors, respectively, ui,..., un. These
eigenvectors may be chosen to be orthonormal. In some cases, we will assume
that A is positive definite, with spectral condition number k(A) = ||A||||A_1|| =
A n
V
In this chapter we will avoid introducing a basis to the extent possible.
This makes these concepts and results more general, and possibly applicable to
infinite dimensional vector spaces (e.g., Hilbert spaces). Investigation into this
area is beyond the scope of this work, but promises interesting results for future
work.
75


The Rayleigh quotient p(x\ A), is defined for x G Rn with x ^ 0 as:
p(x) = p(x\ A) = (Ax, x)/(x, x), (3.1)
where (x, x) = xTx = |[x||2 as in [49]. If it is obvious which matrix A is involved,
the form p(x) will be used.
A significant application of the Rayleigh quotient involves the Courant-
Fischer Theorem. We restate here the Courant-Fischer Theorem (Theorem
10.2.1 of [49]), which is known as the minimax characterization of eigenvalues.
It is used in this investigation to obtain several important results.
Theorem 3.1 Let A E RnXn be a symmetric matrix with eigenvalues Ai <
< An. Then
Aj = mingicR" maxgegj p(g\ A), j = 1,..., n. (3.2)
dim Q3=j
Given a vector x G Rn with s^O, the residual is defined as:
r(x) = r(x; A) = (A p(x\ A)I)x, (3.3)
where I is the identity matrix. If it is obvious which matrix A is involved, the
form r(x) will be used. It is worth noting that the gradient of p(x] A) is given
by
Vp(x, A) = -r-^-v r(z; A). (3.4)
(x,x)
We need to define some concepts concerning angles between vectors and an-
gles between subspaces which are used in this chapter. The acute angle between
two non-zero vectors x, y G Rn is denoted by
/r t \ix,y)\
A{x, y} = arccos .. .... ...
76


It is important to emphasize that 0 < Z{x,yj < , by definition. The largest
Li
principal angle between the subspaces T,Q C Rn is denoted by G}. The
definition and concepts involving principal angles between subspaces is discussed
in detail in Chapter 2.
The following result is used in [33], although this reference does not include
a proof.
Lemma 3.1 Let A £ Rnxn be a symmetric matrix and let x £ Rn with x ^ 0.
Let Px be the orthogonal projector onto the subspace spanned by the vector x.
Then
r{x\A) |1
11*11
\\(I PX)APX
Proof: We have
11(7 P,)APX
- W P.)AP*y\\
v#? Il.-'li
First we prove that
(Ax: A)PX -- PXAPX.
(3.5)
Let yiiU2 £ Rn with y\ = ol\x + w\ and 7/2 = ol^x + W2, where wi,W2 are
orthogonal to x. Then
((p(x; A)PX PxAPx)yi,y2) = ((p(x; A) A)Pxyu Pxy2)
= otiOL2((p{x\ A) A)x, x) = 0.
Since y\ and y2 are arbitrary, p{x\ A)PX PXAPX = 0.
77


For any vector y, we have y = ax+w for some real a and vector w orthogonal
to x. Then
||(/ Px)APxy\\ |a|||(J Px)APxx|| \a\\\Ax PxAPxx\
IMI
IMI
\a\\\Ax p{x] A)x\\
IMI
|o!|||r(a;; A)||
IMI VM2IMI2 + MI2
and for y = x this bound is achieved, which proves the result.
<
||r(x;A)||
x\
There is another interesting result which relates p{x\ A) and r{x\ A) to the
angle between x and Ax.
Lemma 3.2 Let A 6 Rnxn be a symmetric matrix and let x G Rn with x ^ 0
and assume that Ax ^ 0. Then
\\r{x-A)\\
||Ax||
and assuming p{x\A) ^ 0, we have
\\r{x-A)\\
= sin(Z{:r, Ax}),
(3.6)
x
= \p(x; A)| tan(Z{:r, Ax}).
(3.7)
Proof: We have
cos(Z{x, Ax})
\{x,Ax)\ = \p{x-,A)\ ||g|l
IMIII^II II^MI
and
sin2(Z{x,Ax}) = 1 cos 2{Z{x,Ax}) =
\\Ax\\2-p{x-A)2\\x\\2 ||r(*;A)||s
||Ac||2 ll^ll2
This proves (3.6). Combining the results for cosine and sine, we obtain (3.7).
78


Another important concept involves what is regarded as an invariant sub-
space. A subspace S C Rn is invariant w.r.t to the matrix A, if it satisfies the
condition that if x £ S, then Ax £ 3.2 Subspaces and the RayleighRitz Procedure
Let A £ Rnxn be a symmetric matrix and S C Rn be an ra-dimensional
subspace, 0 < m < n. We can define an operator A = PsAjs on S, where
Ps is the orthogonal projection onto S and PsA\s denotes the restriction of
PsA to S. Eigenvalues of A are called Ritz values, Ai < < Am, and the
corresponding eigenvectors, u\< and Ritz vectors can be characterized in several ways as is discussed in [13, 49].
The operator A is self-adjoint, since if x, y £ {Ax,y) = (PsAx,y) = (x,APsy) = (Psx,Ay) = (x,PsAy) = (x,Ay).
Utilizing a setup similar to the Courant-Fischer Theorem (Theorem 3.1),
Ritz values are characterized, as in equation (11.5) of [49], by
Aj = minxes maxgegj p{g\ A), j = 1,..., m. (3.8)
dim S'3 =j
Here, C S instead of Qi C Rn as in (3.2).
Equivalently, let the columns of Q £ Rnxm form an orthonormal basis for S
(Theorem 11.4.1 of [49]), then A = QTAQ = QTAQ is a matrix representation
of the operator A = PsA\s in this basis, and A j, j = 1,..., m are the eigenvalues
of A.
Assuming w £ Rm, a linear map (p : Rm can be defined by The inner product (and hence the norm) is preserved under this map, since if
79


w, v G Rm then
(w, v) = (QtQw, u) = (Qto, Qv) = ( Thus

served under if.
Now we define the term residual matrix which is discussed in Section 11.4
of [49]. Given a subspace S C Rn, Q as defined above and B G RmXm, the
residual matrix is given by
R(B) =AQ- QB.
We have
||fl(i)j| = ||BWTJ4a)|| = ||Ag-g(QTiiQ)||
= ii (/ - = iia qqt)aqqt ii
= (3.10)
The residual matrix has a minimal property defined by Theorem 11.4.2 of
[49] which is repeated below:
Theorem 3.2 For a given matrix Q G Rnxm with orthonormal columns and
A = QTAQ,
||R(i)|| < \\R(B)\\ for all B G Rmxm
Several important properties related to Raleigh-Ritz approximations are
given in the following Lemma:
80


Lemma 3.3 Let the columns of Q e Rnxm form an orthonormal basis for S
and let 1. Considering the Rayleigh quotient,
p(w, A) = p{ip{w),A), Vw E Rm.
2. Considering the residual,
||r(ty,i)|| = ||r(^(^)!i)||, Vu; E Rm.
3. The eigenvectors uq, u>2,... ,wm of A and the corresponding Ritz vectors
ui,... ,um E S, Ui = i = 1,... ,m, can be chosen to be orthonor-
mal.
Proof: Let A E Rmxm be the matrix representation of A in the basis of
orthonormal columns of Q, as defined above. Then AQ = QA, and therefore
QTAQ = A.
1. Let w E Rm and let p>(w) = x E S, then considering the Rayleigh quotient,
we have
p(w,A)
('QTAQw, w) (AQw, Qw)
(w,w) (Qw,Qw)
= p(x, A) = p( 81


2. Let w E Rm and let (p(w) x E S, then considering the residual, we have
||r(tu,A)|| = \\Aw p(w, A)w\\ = \\QT AQw p(x, A)w\\
= \\QT(Ax p(x, A)x|| = ||QQT(Ax p(x, A)z||
= \\Ps{Ax p(x, A)x|| = |PsAx p(x, A)Psx||
= \\Ax p{x,A)x\\ = ||r(x, A)|| = \\r{ 3. Let w, v E Rm. Since A is self-adjoint we have
(Am, v) = (QtAQw, v) = (AQw, Qv) (w, QTAQv) (w, Av),
and A is self-adjoint (i.e. symmetric), thus A has an orthonormal set of
eigenvectors wx,W2,.. ,wm. Since Ui = ip(wi), i = 1,..., m are also orthonormal in S C Rn.
Some important properties of Ritz values and Ritz vectors are given below:
Lemma 3.4 Let S C Rn be an m-dimensional sub space with 0 < m < n, and
A G RnXn be a symmetric matrix. Let A = PsA\s, where Ps is an orthogonal
projection onto S. Let Ai < < Am be the Ritz values for A w.r.t to the
subspace S, and Ai < < An be the eigenvalues of A. Then the following
properties hold:
1. Considering the Raleigh Quotient, p(x;A) = p(x; A), Vx E S.
2. The norm of the residual is dominated as follows: ||r(a:;A)|| < ||r(x;A)||,
Vz G 82


3. A has a Ritz-pair (A, u) (i.e. Ritz value and Ritz vector), if and only if
Au Au £ e)-1.
4- The Ritz values satisfy: A; < A, < An, i = 1,..., m.
5. Assuming that A is positive definite, k(A) < k(A), where k(-) is the spec-
tral condition number.
Proof: Let x G S in. each of the following statements:
1. Based on direct calculation,.
M) = = (P4 = f>(*,A).
(X, X) {x,x) (X, X)
2. Invoking part 1 for the second equation, we have
||r(>;i)|| = \\Ax- p(x,A)x\\ = \\PsAx p(x,A)x\\
= \\PsAx p(x,A)x\\ = \\Ps(Ax p(x,A)x)\\
< \\Ps\\\\Ax-p(x,A)x\\
= \\Ax-p(x,A)x\\ = \\r(x]A)\\.
Assuming that S is invariant w.r.t A, starting with the second equation,
we have
\\r(x;A)\\ = \\PsAx p(x,A)x\\ = \\Ax p(x, A)x\\ = ||r(a;;A)||.
3. If A has a Ritz-pair (A, u), then Au Xu = 0. If y G S, then
(Au Au, y) = (Au Au, Psy) = (Ps(Au Xu),y)
= ((PsAu-Xu),y) = ((Au-\u),y) = 0.
83


and (Au Xu) G S1-.
On the other hand, if (Au Xu) G S1, then Ps(Au Xu) = 0, therefore
|| Au Au||2 = (Au Xu, Au Xu)
= (Ps(Au Xu), Ps(Au Xu)) = 0.
Thus, Au Xu, = 0 and (X,u) is a Ritz-pair of A.
4. Let Hi,... ,um be an orthonormal set of Ritz vectors. For 1 < k < m, let
Qk = (ui,...,uk). Then
Afc = maxgegk p(g\A).
This follows, since if g = J2^=i aiUi, with J2i=i ai = then
k k k
p(g\A) = (AQ2aiUi),^2aiUi) = (3.11)
i=1 i=1 i=l
Orthonormality of Ritz vectors and the fact that AHi XiUi G imply
, , I ^ if i = j\
[ 0 if
which is needed in (3.11).
Then by Theorem 3.2
Afc min^cR" maxggk p(g\A)< maxgegk p(g\ A) = Xk.
dim Qk=k
Clearly
Ak = maXpga* p(9>a) < maxff6Rn p(g;A) = Xn.
Thus Afc < Afc < An.
84


5. Invoking part 4, the spectral condition number satisfies
K(i) = Z < Z = k(a).
Ai Ai
In our analysis concerning the Rayleigh Quotient, we will consider some
subspaces that are 2-dimensional. Restricting the analysis to a 2-D subspace is
a powerful approach that can yield results, which can then be extended to larger
dimensional spaces, as in Lemma 3.4. In the literature, this is referred to as a
mini-dimensional analysis, e.g., [31, 32, 39, 44, 45, 46]. Another interesting
application of this concept is to prove that the field of values (numerical range)
of an operator is convex [19].
3.3 Analysis in 2Dimensions
Assume that a matrix A E R2x2 is symmetric and has eigenvalues Ai < A2,
with orthonormal eigenvectors, respectively, u\ and w2. Let x and
y PxU\ +P2U2, with ||x|| = 1 and ||?/|| = 1. We have that |o?i| = cos(Z{x, ui}),
\a2\ = cos(Z{x,u2}), |/A| = cos(Z{y,Wi}) and |/32| = cos(Z{y,u2}). Then
p(y) P{x) = Ai {Pi a\) + A2(/?2 a2),
and utilizing the assumptions that a\ + a2 = 1 and /32 + /?f = 1, we obtain
p(y)-p(x) = (A2-A i)(Pi-<4)
= (A2 Ai)(a!2 /51);
85


The first equation yields
p{y) p{x) (A2 Ai)(ai/?2 + O2@i) (c*l/?2 Pi),
therefore
IP(y) ~ p{x)| = (A2 ~ A 1)|(a1i0a + <*2A)I - a2j0i)|.
We know
cos(Z{a:,x/}) = |ai/3i + a2/?2|,
and expressing the angle between x and y as
sin2(Z{a:, y}) = l-(x,yf
= 1 (aiPi + q;2/32)2
= (ai/32 a2Pi)2,
we obtain
sin(Z{x,y}) = |oru02 OfeAl-
There are two very useful formulations involving sine and tangent. First
Ip(y) p(x)I = (A2 Ai)|(o!i/32 + ol2I3i)| {(atife oc2Pi)\
= (A2 Ai)|o;i/32 + a2/3i| sin(Z{a;,x/})
< (A2 Ai)(|ai||^2| + |o!2HA|) sin(Z{x)x/})
= (A2- \i)G(x,y)sm(Z{x,y}),
(3.12)
(3.13)
86


where
G(x,y) = \ai\\fo\ + \oti\\Pi\
= cos(Z{a:,wi}) sin(Z{?/,Mi})
+ sin(Z{a;,Ui}) cos(Z{?/,ui}). (3-14)
To compute the formula involving the tangent, assume cos(Z{rc, y}) ^ 0,
then
\p(y)-p(x)\ = (A2-Ai)|(ai^2 + a2A)l l(i/?2-a2)0i)|
= (A2 Ai)|(aia2 + /3i/32)|tan(Z{a;Jy})
< (A2 Ai)(|ai||a2| + l/?i]l/?2|) tan(Z{m,y}). (3.15)
We can express the residual r(x] A) in terms of ct\ and a2 as
||r(a;;A)||2 = (Ai p{x\ A))2a{ + (A2 p(x; A))2al
= (Ai (Aia2 + A2a|))2a!2 + (A2 (Aia2 + A^2))2^2
= (X2-X^ia^aj + alal)
= (A2-A1)2(a?a2)(a? + ag)
- (A.-AOV2^2)-
Thus
Similarly
||r(x;j4)|| = (A2 Ai)|ai| \a2\.
im>/iX)ii = (a2-a1)iai m
(3.16)
(3.17)
87


We note that
2
A2 Ai
lk(a0111k(s/)H
INIIIMI
< G(x,y) < 1.
(3.18)
The upper bound follows from either the Cauchy-Schwarz inequality or a double
angle formula. Considering the lower bound, we have
G{x, y)
1
A2 A
1
*0*011 1lKy)ll~
INI t IMI J
(3.19)
jj 2
where t = -Taking a minimum of this equation as a function of t, with
I <*21
t > 0, we obtain the lower bound in (3.18).
3.3.1 Eigenvalue Accuracy in 2Dimensions
We first present results in 2-dimensions for eigenvalue errors.
Lemma 3.5 Let A G R2x2 be a symmetric matrix with eigenvalues Ai < A2,
and with corresponding eigenvectors, respectively, U\ and U2, and let x G R2
with x 7^ 0. Then
PP Al = sin2(Z{a;,m}). (3.20)
A2 A1
Furthermore, if cos(Z{x,u\}) ^ 0 then
= tan2(Z{a;,w1}). (3.21)
A2 p{x)
Proof: Let y = Ui in (3.12) and without loss of generality assume ||rc|| = 1,
then since sin2(Z{:r, ui}) = 1 a\ = a^, we have
pip0 Ai 0 2/ / r in
------- = a\ = sm2(Z{a;,u1}).
A2 Ai
88