ANGLES BETWEEN SUBSPACES AND THE RAYLEIGH-RITZ METHOD
by
Peizhen Zhu
M.S., University of Colorado Denver, 2009
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics
2012
This thesis for the Doctor of Philosophy degree by
Peizhen Zhu
has been approved
by
Andrew Knyazev, Advisor
Julien Langou, Chair
Ilse Ipsen
Dianne OLeary
Beresford Parlett
October 26, 2012
li
Zhu, Peizhen (Ph.D., Applied Mathematics)
Angles between subspaces and the Rayleigh-Ritz method
Thesis directed by Professor Andrew Knyazev
ABSTRACT
This thesis concerns four interdependent topics. The first one is principal angles
(or canonical angles) between subspaces (PABS) and their tangents. The second topic
is developing algorithms to compute PABS in high accuracy by using sines of half-
angles. The third topic is the Rayleigh quotient (RQ) and the eigenvalue perturbation
theory for bounded self-adjoint operators in Hilbert spaces. The last, but not least,
topic is the Rayleigh-Ritz (RR) method and mixed majorization-type error bounds
of Ritz value and eigenvalue perturbations.
PABS serve as a classical tool in mathematics, statistics, and applications, e.g.,
data mining. Traditionally, PABS are introduced and used via their cosines. The tan-
gents of PABS have attracted relatively less attention, but are important for analysis
of convergence of subspace iterations for eigenvalue problems. We explicitly construct
matrices, such that their singular values are equal to the tangents of PABS, using or-
thonormal and non-orthonormal bases for subspaces, and orthogonal projectors.
Computing PABS is one of the basic important problems in numerical linear
algebra. To compute PABS and the principal vectors in high accuracy for large,
i.e., near tt/2, and small angles, we propose new algorithms by using the sines of
half-angles. We generalize our algorithms to computing PABS in the A-based scalar
product. We also describe a numerically robust and fast way to compute PABS from
the upper triangular matrices of the QR factorizations. Perturbation analysis for the
sines and cosines of half-PABS is provided using the majorization technique. The
results of numerical tests are given, supporting the theory.
iii
Next we turn our attention to the eigenvalue perturbation theory. If x is an
eigenvector of a self-adjoint bounded operator A in a Hilbert space, then the RQ p{x)
of the vector x is an exact eigenvalue of A. In this case, the absolute change of the
RQ \p(x) p(y) \ becomes the absolute error in an eigenvalue p(x) of A approximated
by the RQ p(y) on a given vector y. There are three traditional kinds of bounds of the
eigenvalue error: a priori bounds via the angle between vectors; a posteriori bounds
via the norm of the residual; mixed type bounds using both the angle and the norm
of the residual. In this work, we examine these three kinds of bounds and derive new
identities and bounds for the change in the RQ with respect to the change of the
vectors. We propose a unifying approach to prove known bounds of the spectrum,
analyze their sharpness, and derive new sharper bounds.
Finally, we consider bounds of the absolute changes of Ritz values and eigenvalue
approximation errors. The RR method is a widely used technique for computing
approximations to eigenvalues and eigenvectors of a Hermitian matrix A, from a given
subspace. We take advantage of PABS and the residual matrix to derive several new
majorization-type mixed bounds that improve and generalize the existing results. Our
bounds hold for subspaces not necessary A-invariant and extend the mixed bound of
the absolute change of the vector RQ to multidimensional subspaces.
The form and content of this abstract are approved. I recommend its publication.
Approved: Andrew Knyazev
IV
ACKNOWLEDGMENT
I wish to express my sincere gratitude to my advisor, Professor Andrew Knyazev,
for his generous instruction, continuous patience, endless encouragement, and support
throughout my graduate studies. Without his guidance, the present dissertation
would not be possible. I just cannot thank Professor Knyazev enough.
The PhD committee Chair, Professor Julien Langou, has done a great job orga-
nizing the defense. I would also like to thank the rest of my committee members,
Professors Ilse Ipsen, Dianne 0;Leary, and Beresford Parlett, for their time, attention
and insightful comments.
I would like to express my appreciation to Dr. Merico Argentati, who has been a
knowledgeable and friendly source of advice. His comments are very valuable.
I am grateful to the faculty and my fellow students from the Department of
Mathematical and Statistical Sciences. In particular, I would like to thank Professor
Richard Lundgren and Professor Leopoldo Franca for referring and introducing me
to Professor Knyazev and his group. I would like to thank my fellow students Henc
Bouwmeester, Eric Sullivan, Kannanut Chamsri, Yongxia Kuang, and Minjeong Kim
for sharing their various knowledge.
In addition, I would like to express my gratitude to the Bateman family and the
Department of Mathematical and Statistical Sciences for their invaluable support.
My research and work on the thesis have also been supported by NSF DMS
grants 0612751 and 1115734, the PI Professor Andrew Knyazev. I have presented
some preliminary results of the thesis at the Conference on Numerical Linear Algebra
Austin, TX, July 20, 2010. My talk has been accepted to the Householder Symposium
XVIII on Numerical Linear Algebra June 12-17, 2011, but canceled because of my
pregnancy. I acknowledge the support of the conference organizers.
Finally, I would like to thank my husband Xinhua and my parents for their endless
support and love, and my daughter Jessie for being an inspiration.
TABLE OF CONTENTS
Figures .................................................................. viii
Tables...................................................................... ix
Chapter
1.Introduction............................................................... 1
1.1 Overview............................................................ 1
1.2 Notation............................................................ 5
2. Principal angles between subspaces (PABS) and their tangents.............. 7
2.1 Definition and basic properties of PABS............................. 7
2.2 Tangents of PABS................................................... 10
2.2.1 tan 0 in terms of the orthonormal bases of subspaces... 11
2.2.2 tan 0 in terms of the bases of subspaces............... 16
2.2.3 tan in terms of the orthogonal projectors on subspaces ... 21
2.3 Conclusions ....................................................... 26
3. New approaches to compute PABS ....................................... 27
3.1 Algebraic approaches to compute sin (and cos (.) 29
3.2 Geometric approaches to compute sin ).............................. 37
3.3 Computing sin and cos ff) without using orthonormal bases . 42
3.4 Computing sin ()and cos (in the A-based scalar product ... . 49
3.5 Matlab implementation.............................................. 58
3.6 Numerical tests ................................................... 60
3.7 Discussion......................................................... 67
3.8 Conclusions ....................................................... 70
4. Majorization inequalities................................................ 71
4.1 Definition and properties of majorization.......................... 71
4.2 Some known majorization results ................................... 73
4.3 Unitarily invariant norms, symmetric gauge functions, and majorization 75
vi
4.4 New majorization inequalities for singular values and eigenvalues . 77
4.5 Perturbation analysis for principal angles........................... 81
4.6 Conclusions ......................................................... 92
5. Bounds for the Rayleigh quotient and the spectrum of self-adjoint operators 94
5.1 Introduction......................................................... 94
5.2 Short review of some known error bounds for eigenvalues ............. 96
5.3 Key identities for RQ................................................ 98
5.3.1 Identities for the norm of the residual r (x) = Ax p(x)x . 99
5.3.2 Identities for the absolute change in the RQ.................100
5.4 Deriving some known eigenvalue error bounds from our identities ..104
5.5 New a posteriori bounds for eigenvalues..............................105
5.6 Improving known error bounds for eigenvectors........................108
5.7 Conclusions .........................................................Ill
6. Majorization bounds for the absolute change of Ritz values and eigenvalues 112
6.1 Rayleigh-Ritz method.................................................114
6.2 Perturbation of eigenspaces..........................................115
6.3 Majorization-type mixed bounds.......................................117
6.4 Discussion...........................................................126
6.5 Conclusions .........................................................130
7. Conclusions and future work.................................................131
Appendix
A. MATLAB code.................................................................133
References.....................................................................150
Vll
FIGURES
Figure
2.1 The PABS in 2D.........................................................11
2.2 Geometrical meaning of r =3(3)................................ 24
3.1 Smes and cosines or nalr-angles........................................ 32
3.2 Sine of half-angle between x and y in M.n.............................. 38
3.3 Errors in principal angles: p = 20 top and p as a function of n such that
p = n/2 bottom; left for Algorithm 3.1 and Algorithm 3.4 with A = /,
and right for Algorithm sine-cosine and Algorithm sine-cosine with A L 62
3.4 Errors in the principal angles for Algorithms 3.1, 3.2, and sine-cosine based
algorithm with D = diag(10_16rand^1^) and p = 20...................... 63
3.5 Error distributions for individual angles: top three for Algorithm 3.1,
Algorithm 3.2, and Algorithm sine-cosine; bottom three for Algorithms
3.4, 3.5, and sine-cosine with A =1................................... 65
3.6 Errors in the principal angles for Algorithms 3.3 and 3.6.............. 66
3.7 Sine of angles as functions of k for Algorithms 3.4, 3.5, and 3.6 (top and
left on bottom); Errors as functions of condition number for Algorithms
3.4, 3.5, and 3.6 (right on bottom)................................... 67
4.1 Geometry of majorization in 2D and 3D.................................. 72
6.1 Comparison of the absolute change of Ritz values for the mixed bounds
and for the a priori bounds with the exact absolute change of Ritz values
(left); The corresponding tangent of the largest principal angle (right)..129
6.2 Comparing the mixed type bound of the absolute change of Ritz values
with the exact absolute change of Ritz values.........................129
viii
TABLES
Table
2.1 Different matrices in T\ matrix T G J7 using orthonormal bases...... 15
2.2 Different expressions for T......................................... 25
2.3 Different formulas for T with 0(< tt/2: dim(< dim(V) (left);
dim( dim(3^) (right)........................................... 26
3.1 Algorithm 3.1...................................................... 33
3.2 Algorithm 3.2...................................................... 41
3.3 Algorithm 3.3...................................................... 46
3.4 Algorithm 3.4...................................................... 53
3.5 Algorithm 3.5...................................................... 54
3.6 Algorithm 3.6...................................................... 57
3.9 Computed sines and cosines of principal angles.................... 61
3.10 Algorithms to compute PABS and PABS m the A-based scalar product. 68
3.11 Comparison for algorithms to compute PABS......................... 69
3.12 Comparison for algorithms to compute PABS m the A-based scalar prod-
uct (1)............................................................. 69
IX
1.Introduction
1.1 Overview
This dissertation mainly consists of four parts. The first part deals with principal
angles (or canonical angles) between subspaces (PABS) and their tangents. The
second part develops algorithms to compute PABS in high accuracy by using sines
of half-angles. The third part studies the Rayleigh quotient (RQ) and the eigenvalue
perturbation theory for bounded self-adjoint operators in Hilbert spaces. The last part
explores the Rayleigh-Ritz (RR) method and properties of Ritz values and eigenvalues
of Hermitian matrices.
The history of PABS dates to Jordan in 1875 [40]. In statistics, the cosines of
principal angles and principal vectors are interpreted as canonical correlations and
canonical variables, correspondingly. PABS are widely used in many applications,
e.g., in ecology, information retrieval, random processes, face recognition, and system
identification; e.g., [27, 39, 43, 45, 78]. The cosines, sines, or tangents of PABS
are commonly used to measure the accuracy of numerical computation of invariant
subspaces of matrices; e.g., [47, 51,64, 73].
The sines and especially cosines of PABS are well studied; e.g., in [9, 74]. Tangents
of PABS also appear in applications, but are less investigated. In [10, 20, 52, 71,73],
researchers define the tangents of PABS of the same dimension via the norm or
singular values of some matrices, and use the tangents as analytic tools. Specifically,
the tangents of PABS are related to the singular values of a matrixwithout explicit
formulation of the matrixin [73, p. 231-232] and [71, Theorem 2.4, p. 252]. The
tangent of the largest principal angle is obtained from the norm of a matrix in [10, 20].
In [52], the tangents are used to derive convergence rate bounds of subspace iterations.
In Chapter 2, we briefly review the concepts and the most fundamental and im-
portant properties of PABS. We give explicit matrices, such that their singular values
are the tangents of PABS. We construct matrices using three different scenarios:
1
orthonormal bases, non-orthonormal bases, and orthogonal projectors. Using a geo-
metric approach, new constructions of such matrices are obtained. In addition, our
results include the formulations of matrices consistent with those in [10, 20, 52, 71,73].
Furthermore, we present the tangents of PABS with possibly different dimensions.
Computing PABS is one of the basic important problems in numerical linear
algebra. An efficient method, based on singular value decomposition (SVD) for cal-
culating cosines of PABS, is proposed by Bjorck and Golub [9]. The cosines of prin-
cipal angles and principal vectors are obtained by using the SVD of XHY^ where the
columns of matrices X and Y form orthonormal bases for the subspaces X and 3^,
correspondingly.
However, the cosine-based algorithm cannot provide accurate results for small an-
gles, i.e., angles smaller than 10-8 are computed as zero in double precision arithmetic
with EPS ^ 10-16. For the same reason, the sine-based algorithm loses accuracy for
computing large principal angles, i.e., close to tt/2. A remedy is to combine sine
and cosine based algorithms proposed by Knyazev and Argentati [48], where small
and large angles are computed separately. They introduced a threshold to compute
the angles less than the threshold using the sine-based approach and to compute the
angles greater than the threshold using the cosine-based approach. However, this
algorithm is complicated, and introduces a new potential difficultyaccurate com-
putation of the principal vectors corresponding to the PAB^ m the neighborhood of
the threshold.
In Chapter 3, we propose new algorithms to compute PABS and the principal
vectors using the sines of half-PABS. Our new algorithms provide accurate results
for all PABS m one sweep and can be implemented simply and efficiently. We also
propose a method to compute PABS for sparse data. Furthermore, we generalize our
algorithms to compute the principal angles and the principal vectors in an A-based
scalar product as well, where Aisdu positive definite Hermitian matrix. Perturbation
2
analysis for the sines and cosines of half-PABS is provided in this work. Results of
extensive numerical tests are included.
In Chapter 4, we briefly review some existing majorization inequalities based on
the singular values and the eigenvalues of matrices. Using majorization technique, we
bound the absolute changes of eigenvalues of two Hermitian matrices in terms of the
singular values of the difference of two matrices. We also use majorization technique
to bound the changes of sines and cosines of half-PABS, and of tangents of PABS.
If x is an eigenvector of a self-adjoint bounded operator A in a Hilbert space, then
the RQ p{x) of the vector x is an exact eigenvalue of A. In this case, the absolute
change of the RQ \p(x) p(y) \ becomes the absolute error in an eigenvalue p(x) of A
approximated by the RQ p(y) on a given vector y. There are three traditional kinds of
bounds of eigenvalue errors: a priori bounds [49, 53, 67] via the angle between vectors
x and y\ a posteriori bounds [13, 44, 64] via the norm of the residual Ay p(y)y of a
vector y\ and mixed type bounds [47, 75] using both the angle and the norm of the
residual. In Chapter 5, we propose a unified approach to prove known bounds of the
spectrum, analyze their sharpness, and derive new sharper bounds.
The RR method [64, 73] is a classical method for eigenvalue approximation of
Hermitian matrices. A recent paper by Knyazev and Argentati [51] presents a priori
error bounds of Ritz values and eigenvalues in terms of the principal angles between
subspaces X and 3^ using a constant. In [5, 64, 73], researchers derive a posterior
bounds of eigenvalues in terms of the singular values of the residual matrix. Another
a posterior bound obtained by Mathias [56] uses the residual matrix and the gap.
In Chapter 6, we take advantage of the principal angles and the residual matrix
to present new majorization-type mixed bounds. We improve the bounds in [61,75]
if one of the subspaces is A-invariant. In addition, our bounds are more general, since
they hold for subspaces not necessary A-invariant. We generalize the mixed bound
of the absolute change of the RQ \p(x) p(y)\ to multidimensional subspaces. This
3
work also provides majorization-type mixed bounds for Ritz values and eigenvalues,
and for RQ vector perturbation identities for application in generalized eigenvalue
problems. We compare our new bounds with the known ones.
We present our conclusions and recommend some further work in Chapter 7.
The main new results obtained in the thesis are summarized as follows:
1. We analyze the tangents of PABS related to the SVD of explicitly constructed
matrices. New properties of PABS are obtained.
2. Original numerical algorithms are developed, based on the sines of half-PABS,
to compute PABS and principal vectors for column ranges of dense and sparse
matrices. Moreover, we generalize our algorithms to compute PABS in the
A-based scalar product.
3. We use majorization techniques to present novel perturbation analysis for sines
and cosines of half-PABS.
4. A unifying approach is proposed, based on new RQ vector perturbation iden-
tities, to prove bounds of the operator spectrum, analyze their sharpness, and
derive new sharp bounds.
5. We present several majorization-type mixed bounds for the absolute changes
of Ritz values and the RR absolute eigenvalue approximation error. The
majorization-type mixed bounds are in terms of the principal angles and the
residual matrix.
4
1.2 Notation
A = AH A Hermitian matrix in Cnxn or a self-adjoint bounded operator.
S(A) The vector or singular values of A and 5(A) = [si(A)^..., sn(A)]
S+(A) The vector of positive singular values of A.
S^(A) The singular values of A are arranged in nonincreasing order.
Sm^(A) The vector of the largest m singular values of A m nonincreasing order.
Si(A) The vector of the smallest m singular values of A in nonincreasing order.
5 max (^4) The largest singular value of A.
^min (^4) The smallest singular value of A.
w Vector norm of x.
Pll Spectral norm of the matrix A, i.e., \\A\\ = smax(A).
PI Frobenius norm of A.
IPIII Unitarily invariant norm of A.
Subspaces of Cnxn.
1 The orthogonal complement of the subspace X.
The column space of a matrix X.
Px Orthogonal projector on the subspace X or TZ(X).
P^ Orthogonal projector on the subspace
A The vector or eigenvalues of A and (A) = [A^A)n(A)].
A The eigenvalues of A are arranged in nonincreasing order.
0(^,3^) The vector of principal angles between the subspaces X and 3^, and 0(y) = [6*1...(9P].
H^,y) Principal angles are arranged in nonincreasing order. 5
;^) Principal angles are arranged in nondecreasing order.
(0,/2) The vector of principal angles in (0,tt/2).
eA(x,y) The vector of principal angles in the A-based scalar product between X and 3^ .
max(x,y) The largest angle between subspaces X and 3^.
min(x,y) The smallest angle between subspaces X and 3^.
rank i A) The rank of the matrix A.
x -< y The vector x is majorized by the vector y.
'w y The vector x is weakly majorized by y.
The absolute value of the vector x.
At Moore-Penrose inverse of A.
re re(A) =
pol .. The square root of the matrix AHA^ i.e., \A\po\ = {AHA)1/2. An inner product, associated with a norm by 2 =(*,*).
_B The scalar product induced by BJ where B is positive definite
The condition number of a matrix A in the 2 norm, such that _ Smax()/Smin().
p(x) The Rayleigh quotient of A with respect to the vector x.
H A real or complex Hilbert space.
spec(A) Spectrum of the bounded operator A.
6
2. Principal angles between subspaces (PABS) and their tangents1
The concept of principal angles between subspaces (PABS) is first introduced by
Jordan [40] in 1875. Hotelling [37] defines PABS in the form of canonical correlations
in statistics in 1936. Numerous researchers work on PABS; see, e.g., our pseudo-
random choice of initial references [18, 38, 50, 73, 74, 79], out of tens of thousand of
Internet links for the keywords principal angles and canonical angles.
In this chapter, we first briefly review the concept and some important properties
of PABS in Section 2.1. Traditionally, PABS are introduced and used via their sines
and more commonly, because of the connection to canonical correlations, cosines. The
tangents of PABS have attracted relatively less attention. Our interest to the tangents
of PABS is motivated by its applications in theoretical analysis of convergence of
subspace iterations for eigenvalue problems.
We review some previous work on the tangents of PABS in Section 2.2. The main
goal of this chapter is explicitly constructing a family of matrices such that their
singular values are equal to the tangents of PABS. We form these matrices using sev-
eral different approaches: orthonormal bases for subspaces in Subsection 2.2.1, non-
orthonormal bases in Subsection 2.2.2, and orthogonal projectors in Subsection 2.2.3.
Throughout this chapter, we also discover new properties of PABS.
2.1 Definition and basic properties of PABS
In this section, we remind the reader of the concept of PABS and some funda-
mental properties of PABS. First, we recall that an acute angle between two nonzero
vectors x and y is defined as
cos0(x,y) = \xHy\,
with xHx = yHy =1,where 0 < 6(x^y) < tt/2.
1The material of this chapter is based on our manuscript, P. Zhu and A. V. Knyazev, Principal
angles between two subspaces and their tangents^ submitted to Linear Algebra and its Applications.
Also, it is available at http://arxiv.Org/abs/l209.0523.
7
The definition of an acute angle between two nonzero vectors can be extended to
PABS; see, e.g., [9, 24, 30, 37].
Definition 2.1.1 Let C Cn and Y C Cn 6e dim()=p and
dim(3^) = q. Let m = min (p, q). The principal angles
0(^,3^) = [6*i,...,6*m] where 9k G [0,tt/2], fc =1,...,m,
between X and y are recursively defined by
=cos(4) = maxmaxlxHyl = I
subject to
IIII = IMI =ixIlxi = > yHVi = > i =1,...,k -l.
The vectors {x\,..., xm} and {...ym} are called the principal vectors.
An alternative definition of PABS is proposed in [9, 30] based on the singular
value decomposition (SVD) and reproduced here as the following theorem. It is
shown in [24] that the SVD approach to define PABS is a direct consequence of the
recursive definition.
Theorem 2.1.2 Let the columns of matrices X G Cnxp and Y G Cnxq form or-
thonormal bases for the subspaces X and y, correspondingly. Let the SVD of XHY
he UTjVh ^ where U and V are unitary maznces and ^ is a p by q diagonal matrix
with the real diagonal elements Si(XhY)^ ..., sm(XHY) in nonincreasing order with
m = min(p, q). Then
cseW j) = s (xHy) = [Sl(xHy).. (xHy)]
where denotes the vector of principal angles between X and y arranged in
nondecreasing order and 5(A) denotes the vector of singular values of A. Moreover,
the principal vectors associated with this pair of subspaces are given by the first m
columns of XU and YV, correspondingly.
Theorem 2.1.2 implies that PABS are symmetric, i.e. 0(^,3^) = 0(3^,^), and
unitarily invariant, since (UX)H(UY) = XHY for any unitary matrix U G Cnxn.
A number of other important properties of PABS have been established, for finite
dimensional subspaces, e.g., in [23, 38, 50, 73, 74, 79], and for infinite dimensional
subspaces in [18, 53]. We list a few of the most useful properties of PABS below.
Property 2.1.3 [74] In the notation of Theorem 2.1.2, let p > q and q
[XX] he a unitary matrix. Let Si(X^Y) > > sq(X^Y) he singular values of
XfY. Then
sk{XfY) = sm(9q+1_k), 1..¢.
Relationships of principal angles between X and 3^, and between their orthogonal
complements are thoroughly investigated in [38, 50, 53]. Let the orthogonal comple-
ments of the subspaces X and 3^ be denoted by and correspondingly. The
nonzero principal angles between X and 3^ are the same as those between and
y-1. Similarly, the nonzero principal angles between X and y1-are the same as those
between X1-and 3^.
Property 2.1.4 [38, 50, 53] For the subspaces X, y and their orthogonal comple-
ments, we have
i. [ey),..] = [e)...].
w/zere ere are max(n dim(dim(Y)0) additional Os on the left and
max(dim(A,) + dim(3^) n,0) additional Os on the right.
[00...0] = [0(y)0...0].
where there are dim(X)Q) additional Os on the left and there
are max(dim(A,) dim(3^),0) additional Os on the right.
* [f...f)]=[f (),...]
where there are max(dim(A,) dim(3^),0) additional ir/2s on the left and
9
max(dim(A,) + dim(3^) n,0) additional Os on the right.
These statements are widely used to discover new properties of PABS.
2.2 Tangents of PABS
The tangents of PABS serve as an important tool in numerical matrix analysis.
For example, in [10, 20], the authors use the tangent of the largest principal angle
derived from a norm of a specific matrix. In [71, Theorem 2.4, p. 252] and [73, p. 231-
232] the tangents of PABS, related to singular values of a matrixwithout an explicit
matrix formulationare used to analyze perturbations or invariant subspaces. In [52],
the tangents are used to derive the convergence rate bounds of subspace iterations.
In [10, 20, 52, 71,73], the two subspaces have the same dimensions.
Let the orthonormal columns of matrices X, X_l, and Y span the subspaces
the orthogonal complement of X ^ and 3^, correspondingly. According to Theo-
rem 2.1.2 and Property 2.1.3, 008 0(^,3^) = S(XHY) and sin0(^,3^) = S(X^Y).
The properties of sines and especially cosines of PABS are well investigated; e.g., see
in [9, 74]. The tangents of PABS have attracted relatively less attention. One could
obtain the tangents of PABS by using the sines and the cosines of PABS directly,
however, this is not the most efficient or accurate approach in some situations.
In this work, we construct a family T of explicitly given matrices, such that the
singular values of the matrix T are the tangents of PABS, where T G J7. We form T
in three different ways. First, we derive T as the product of matrices whose columns
form the orthonormal bases of subspaces. Second, we present i1 as the product of
matrices with non-orthonormal columns. Third, we form T as the product of or-
thogonal projections on subspaces. For better understanding, we provide a geometric
interpretation or singular values of T and of the action of T as a linear operator.
Our constructions include the matrices presented in [10, 20, 52, 71,73]. Further-
more, we consider the tangents of principal angles between two subspaces not only
with the same dimension, but also with different dimensions.
10
2.2.1 tan in terms of the orthonormal bases of subspaces
The sines and cosines of PABS are obtained from the singular values of explicitly
given matrices, which motivate us to construct T as follows: if matrices X and Y
have orthonormal columns and XHY is invertible, let T = X^Y (^XHY^j \ then
tan 0(^,3^) can be equal to the positive singular values of T. We begin with an
example in two dimensions in Figure 2.1.
Figure 2.1: The PABS in 2D.
cos /3 sin/3 cos (6 + P)
Let X = = , and Y =
sin/3 cos /3 sin (8 + P)
where 0 < 61 < tt/2 and /3 G [0,tt/2]. Then, XHY = cos^ and X^Y = sin^.
Obviously, tan6* coincides with the singular value of T = X^Y ^XHY^ \ If 61 = tt/2,
then the matrix XHY is singular, which demonstrates that, in general, we need to
use its Moore-Penrose Pseudoinverse to form our matrix T = X^Y
The Moore-Penrose Pseudoinverse is well known. For a fixed matrix A G Cnxm,
the Moore-Penrose Pseudoinverse is the unique matrix A"' G Cmxn satisfying
AAM = = #A4=A4#=AM.
11
Some properties of the Moore-Penrose Pseudoinverse are listed as follows.
HA UTjVh is the singular value decomposition of A, then = VY^UH.
HA has full column rank, and B has full row rank, then {AB)^ .
However, this formula does not hold for general matrices A and
AA' is the orthogonal projector on the range of A, and A"'A is the orthogonal
projector on the range of AH.
For additional properties of the Moore-Penrose Pseudoinverse, we refer the reader
to [73]. Now we are ready to prove that the intuition discussed above, suggesting to
try T = XfY (XhY)\ is correct. Moreover, we cover the case of the subspaces X
and 3^ having possibly different dimensions.
Theorem 2.2.1 Lei X G Cnxp have orthonormal columns and he arbitrarily com-
pleted to a unitary matrix [XX\. Let the matrix Y G Cnxq he such that YHY = I.
Then the positive singular values S+(T) of the matrix T = XfY [XhyY satisfy
tanQ(7l(X)^7l(Y)) = [oc, S+(T)^ (2.2.1)
where TZ(.) denotes the matrix column range.
Proof: Let [y y_j_] be unitary, then [x X]H[YY] unitary, such that
[xx]h[yy]
q n q
V xhy xhy
n p XfY X^Y
Applying the classical CS-decomposition (CSD), e.g., [22, 28, 62, 63], we get
H
[xx]h[yy]
XhY XhY
XfY X^Y
t/l
u2
D
12
where U\ G W(p), U2 G U(n p), Vi G W(g), and V2 G U(n q). The symbol U{k)
denotes the family of unitary matrices of the size k. The matrix D has the following
structure:
r s qrs npq-\-r s prs
D =
r s prs 1 Cl 0 0H Si I
npq-\-r 0 I
s Si
qrs I 0H
where C\ = diag(cos(6li),...,cos(6l8)), and S\ = diag(sin(6li),...,sin(6s)) with k G
(0, tt/2) for fc =1,...,s, which are the principal angles between the subspaces TZ(Y)
and TZ(X). The matrices C\ and S\ could be empty. Matrix 0 is the matrix of zeros
and does not have to be square. / denotes the identity matrix. We may have different
sizes of / in D.
Therefore,
T = XfY {XhY)] = U2 0 Si Vfv, I Cl t Uf = u2 0 SiC^1
I 0 0H
Hence, S^(T) = (tan(6li),...,tan(6l8)), where 0 < 61 < 62 < - < 6S < tt/2.
From the matrix 15, we obtain 5" = 5" (diag(/C"i0)). According to The-
orem 2.1.20(7^(X)7^(y)) = [0,0,6^...tt/2, ...tt/2]where (9 G (0tt/2)
for fc =1,...,s, and there are min(q r s^p r s) additional tt/2^s and r additional
07s on the right-hand side, which completes the proof.
Remark 2.2.2 If we know the subspaces Xy and their orthogonal complements
explicitlywe can obtain the exact sizes of block matrices in the matrix D. Let us
13
consider the decomposition of the space Cn into an orthogonal sum of five subspaces
as in [11,31,53]7
Cn =
^ere00 =01 = tny10 =n =ny(=(X)
and y = TZ{Y)). Using Tables 1 and 2 in [53] and Theorem 3.12 in [41]7 we get
dim() = r
dim(9?ti) = g r s,
diir^SDtn) = n n <7 + r
dim(9?ti)=p r s,
w/zere SDt = =2%
=1n (01)
=1n (10n)
9% n (10)
n0?tol
and s = dim(SDt^) = dim(dJly) = dim(dJlx^) = dim(SDt^?). Thus, we can represent
D as
dim(OTc dim(OT)/2 dim(OT10) dimfOTn) dim(OT)/2 dim(OT01)
dim(OT) i 0H
dim(OT)/2 c\ Si
0 I
dim(OTn) 0 I
dim(OT)/2 Si
I 0H
In adaition^ it is possible to permute trie flrst q columns or the last n q columns of
D, or the first p rows or the last n p rows and to change the sign of any column or
urn to obtain the variants of the CSD.
14
Remark 2.2.3 In Theorem 2.2.1, there are additional dim(Xny) 0\s and additional
min (dim( H Y)dim( H )ocs in [,...0, S"+(T)0,0].
Due to the fact that the angles are symmetric, i.e., 0(^,3^) = 9( the
matrix T in Theorem 2.2.1 could be substituted with Yj1 X(YHXy. Moreover, the
nonzero angles between the subspaces X and 3^ are the same as those between
the subspaces and .Hence, T can be presented as XHYj_(X^Yj_y and
YHX_L(Y(1X_Ly. Furthermore, for any matrix T, we have S(T) = which
implies that all conjugate transposes of T hold in Theorem 2.2.1. Let T denote a
family of matrices, such that the singular values of the matrix in T are the tangents
of PABS. To sum up, any of the formulas for T in J7 in the first column of Table 2.1
can be used in Theorem 2.2.1.
Table 2.1: Different matrices in T\ matrix T G J7 using orthonormal bases.
XlY (XhyY ZVY :1^)t
YfX{YHXy Fy1(1^
xhy{x^yY PXY(X^YY
YHX{YfXy PyX(YfXy
{yhxYyhx
{xhyYxhy (xHyyxHpyi
{YfXYYfX (Yfxyrfpx
{XfY^XfY {XfY^XfPy
Using the fact that the singular values are invariant under unitary multiplications,
we can also use for in Theorem 2.2.1, where is ail orthogonal
projector onto the subspace Thus, we can use T as in the second column in
Table 2.1, where ,,and denote the orthogonal projectors on the subspace
, and ,correspondingly. Note that =/ and =/ _Py.
15
Remark 2.2.4 From the proof of Theorem 2.2.1 ^ we immediately derive that
XfY {XhY)] = -(YfXyYfX and YfX(YHXy = -{XfY^XfY.
Remark 2.2.5 Let XfY {XhY)] = U2^iU1 and YfX(YHXy = V2E2Vh be the
SVDs, where Ui and (z =1,2) are unitary matrices and (z =1,2) are diagonal
matrices. Let the diagonal elements of Si and S2 be in the same order. Then the
principal vectors associated with the pair of subspaces X and y are given by the
columns of XUi and YVi.
2.2.2 tan in terms of the bases of subspaces
In the previous section, we use matrices X and Y with orthonormal columns
to formulate T. An interesting question is whether we can choose non-orthonormal
bases for these subspaces to get the same result as in Theorem 2.2.1. It turns out
that we can choose a non-orthonormal basis for one of the subspaces X or y. Before
we present the theorem, we need the following lemma.
Lemma 2.2.6 For integers q
the matrixY G Cnxq be such thatXHY has full rank. Let us define Z = Y(YHY)~1^2,
f = Xfz (XHZ)\ and T = XfY {XHY)]. Then T = T.
Proof: The matrix XHY(p x q) has full rank and q < which implies that the
matrix Y has full rank, i.e., dim(7?.(y)) = dim(7l(XHY)) = q. So, the matrix YHY is
nonsingular. Moreover, the columns of Z = Y(YHY)~1^2 are orthonormal and span
TZ(Y). Using the fact that if A is of full column rank, and B is of full row rank, then
(ABy = it follows that (XHZY = {XHY{YHY)~l/2y = {YHY)l/2{XHYy.
Finally, we have
f = Xfz {XHZ)] = XfY{YHY)~l/2{YHY)l/2{XHY)] = XfY{XHY)] = T.
16
In general, if g > p, the matrices T and T are different. For example, let
1 0 0 ii
X = 0 = 10 , and Y = 0 1
0 0 1 0 0
Then, we have =[1i]and [1/2 1/2.Thus,
T^XfY (X^y)f = [1/2 0]^-
On the other hand, we obtain T = X^Z = [ ]-^.
From the proof of Lemma 2.2.6, we see that the matrix (XHZy is not equal to
(YHY)1^2(XHYy for the case q > since in general the formula {AB)^ does
not hold. Furthermore, in the example above we have
cos0(^(X),^(y)) = s(XhY(YhY)~1/2)=1,
which implies that tan0(7?.(X),7?.(y)) = 0. However, s(T)=1/2, which also shows
that the condition q < p is necessary in the following theorem.
Theorem 2.2.7 For integers q < p7 let X G (Cnxp have orthonormal columns and be
arbitrarily completed to a unitary matrix [XXi]. Let Y 6 Cnxq be such that XhY
has full rank. Let T = XY [XhyY Then
tan0(^(X),^(y)) = [5+(T),
where there are dim(TZ(X) r\TZ(Y)) 0\s on the right.
Proof: Combining Theorem 2.2.1, Remark 2.2.3, and Lemma 2.2.6 directly
proves this theorem.
Remark 2.2.8 We make the explicit matrix T in Theorem 2.2.1 connection to the
implicit matrix which is related to the tangents of angles described m [73, 71].Using
the idea described in [73, p. 231-232]7 [71, Theorem 2.4, p. 252]7 and [52]7 we construct
Z as X+X_\_T. Since [XHYy XHY = Ithe following identities Y = PxY + Pp^Y =
XXhY + XX^Y = ZXhY imply that TZ(Y) C TZ(Z). By direct calculation^ we
easily obtain that
XhZ = Xh(X + XT) = Id,iidZHZ= (X + XT)h(X + XT) = I + ThT.
Thus, XHZ (yZHZ) = (/ + thT)~1/2 is Herrnitian positive definite. The ma-
trix Z{ZUZ)~xl2 by construction has orthonormal columns which span the space Z.
Moreover, we observe that
S (^XHZ(ZHZ)-^ = A ((/ + ThT)~-^ =[(1 + (T))-^ ,...,(l + s2p (T))-1],
where A[.) denotes the vector of eigenvalues.
Therefore, tanQ(7l(X)^7l(Z)) = [^(T), 0,..., 0] and dim(X) = dim(Z). From
Theorem 2.2.7, we have tan0()7r/2)(7^(X),7^(y)) = tan0()7r/2)(7^(X),7^(Z)), where
{/2) denotes all PABS in {0,it/2). In other words, the angles in {0,it/2) be-
tween subspaces TZ(X) and TZ(Y) are the same as those between subspaces TZ(X)
andTZ(Z). For the case p = q) we can see that Q(TZ(X)^TZ(Y)) = Q(TZ(X)^TZ(Z)).
We give an explicit matrix X^Y [XHY^j 1 for the implicit matrix P7 where S(P)=
tanQ(7l(X)^7l(Y))? in [73, p. 231-232] and [71, Theorem 2.4, p. 252]. The case
p = q is considered in [52]. We extend their result to the case q
Corollary 2.2.9 Using the notation of Theorem 2.2.1^ let p = q and he an
orthogonal projection on the subspace TZ(X_\_). We have
tan0((X)T)) = S .
Proof: Since the singular values are invariant under unitary transforms, it is easy
to obtain S (^X^Y (^XHY^j 1 j = 5" Y . Moreover, the
mber of the singular values of y (X1 is p, hence we obtain the result.
Above we always suppose that the matrix Y has full rank. Next, we derive a
similar result as in Theorem 2.2.7, if y is not a full rank matrix.
18
Theorem 2.2.10 Let [X X_l] he a unitary matrix with X G Cnxp. Let Y G
£nxq an(^ ran]^(y) = rank(X-^y) < p. Then the positive singular values of
T = X^Y [XHY^satisfy tanQ(TZ(X)^7l(Y)) = [^(T), 0,..., 0], where there are
dim(TZ(X) DTZ(Y)) 0Js on the right
Proof: Let the rank of Y be r, thus r < p. Let the SVD of Y be UTjVh ^ where U is
an n x n unitary matrix; S is an n x g real rectangular diagonal matrix with diagonal
entries Si,...,0,..., 0 ordered by decreasing magnitude such that Si > S2 > >
> 0, and V is du q x q unitary matrix. Since rank(y)=r < g, we can compute a
reduced SVD such that Y = Ur^V/1. Only the r column vectors of U and the r row
vectors of VH^ corresponding to nonzero singular values are used, which means that
is an r x r invertible diagonal matrix. Let
Z1 = Y (YhY)+1/2 V = UrErVrH [V (Sf Sr)_1 K^]1/2
Let C = VrH Vr 1 VrH Vrj then C is Hermitian and invertible. More-
over, we have Zi = Irxr. Since the matrix Y has rank r, the columns of ^ are
orthonormal and the range of ^ is the same as the range of Y. Therefore, we have
tan e(TZ(X),n(Y)) = tan e(TZ(X),n(Z1)).
Moreover, let Ti=X^_Z\ (^X11 . According to Theorem 2.2.1, it follows that
tan ((X)=[)0,0]
where there are dim(TZ(X) Pi TZ{Zi)) 07s on the right. Our task is now to show that
Ti=T. By direct computation, we have
Ti=X^UrErC (XHUrEry
= X^UrErCCj (XHUr^ry
= X^UrEr (XHUrEry.
19
The second equality is based on [XHUrYjrCy = {XHUrYjr^\ since rank [XHY^j =
r. Since the matrix C is invertible, we obtain the last equality. On the other hand,
T = XfY (XHY)j = XfUr^rVrH (XHUrErVrH)j
= XfUr^rVrH (VrH)j (XHUrEry
= X^UrEr (XHUrEry.
Hence, tan0(7^(X)7^(y)) = [5"(T)0..0] which completes the proof.
Next, we provide an alternative way to construct T by adopting only the trian-
gular matrix from one QR factorization.
Corollary 2.2.11 Let X G Cnxp and Y G Cnxq with q < p be matrices of full
rank, and also let XHY he full rank. Let the triangular part R of the reduced QR
factorization of the matrix L = [XY] he
Rn R12
R = .
O R22
Then the positive singular values of the matrix T = R22(Rv2ysatisfy
tan0(^(X),^(y)) = [5+(T),
where there are dim(TZ(X) r\TZ(Y)) 0\s on the right.
Proof: Let the reduced QR factorization of L be
Rn R12
0 R22
where the columns of [Qx Q2] are orthonormal, and TZ(Qi) = X. We directly obtain
Y = Q\R\2 + Q2-R22* Multiplying by on both sides of the above equality for Y,
we get i?i2 = Q\Y. Similarly, multiplying by we have R22 = Combining
these equalities with Theorem 2.2.10 completes the proof.
20
[xy] = [q1q2]
2.2.3 tan in terms of the orthogonal projectors on subspaces
In this section, we present T using orthogonal projectors only. Before we proceed,
we need a lemma and a corollary.
Lemma 2.2.12 [73] If U and V are unitary matrices, then for any matrix A? we
have (UAVy = VHAHjH.
Corollary 2.2.13 Let A e Cnxn be a block matrix, such that
A
B 012
21 22
and 022
21
H 12 H 22
A
Proof: Let the SVD of B be where 2^ is a p x g diagonal matrix, and
U\ and V\ are unitary matrices. Then there exist unitary matrices U2 and V2, such
that
A
Thus,
ih s On v/1
u2 2l 22 _ v2h_
A
v^i s 12 t u? v^i st H 21 u?
21 22 U2H H 12 H 22 u2h
Since St = RSf, we have A
?2 ^2
as desired.
Theorem 2.2.14 Let PxPyi and Py be orthogonal projectors on the subspaces X
X1-and y, correspondingly. Then the positive singular values S+(T) of the matrix
T = Px^Py (PxPy)j satisfy
tan 0(^, 3^) = [oc, ^(T),
21
w/zere ere are addonaZ min (dim()dim( ny))ocs and addonaZ
dim( Pi on e
Proof: Let the matrices [XX_l] and [yy_j_] be unitary, where TZ(X) = X and
TZ(Y) = 3^. By direct calculation, we obtain
XH
PxPy[YY]=
Xf
XHY 0
O 0
The matrix PxPy can be written as
PxPy = [XX]
XHY 0 yh
0 0 Yf
By Lemma 2.2.12 and Corollary 2.2.13, we have
{pxpyy=[YY] {XhYY0 XH
0 0 _xf_
In a similar way, we get
Px^Py = [XX] XfY 0 yh
0 0 Yf
(2.2.2)
(2.2.3)
(2.2.4)
LetT = /VP3(/^P3)tandS = Xfr(XHr)tso
T=[XX] XfY 0 {xhyYo XH =[x x] BO XH
0 0 0 0 00 _xf_
The singular values are invariant under unitary multiplication. Hence, S^(T)=
S"+(_B). From Theorem 2.2.1, we have tan 0(Y) = 10...00,5"+(T)0...0].
Moreover, from Remark 2.2.3 we can obtain the exact mbers for 0 and oc.
Remark 2.2.15 From the proof of Theorem 2.2.1^ we can obtain the properties for
sines and cosines related to orthogonal projectors. According to representation (2.2.2)
22
( XHY 0 \
V O 0
we have S (P^Py) = S
equality (2.2.4) and Property 2.1.4 we have
S(P^Py) = S
[cos 0(^, 3^), 0,..., 0]. Similarly^ from
cos
XfYO
0 0
[l...lsin(ty)0..0]
(y),
where there are nmx(dim(y) dim(X)0) additional Is on the right.
Furthermorewe have
Xf (Px~Py)[YY]= -XfY O
XH 0 xhy
which implies that the singular values of Px Py are the union of the singular values
of XY and XhYisuch that S Py) = [SXiY)SXhYi)' .By Property
2.1.3, it follows that
S(,i)(Px Py) = [sin0(^/2)(<;f,^),sin0(^/2)(<;f,^)],
where 5,(i)(*) denotes the singular values in (0,1);see also [50, 60] for differ-
ent proofs.
We note that the null space of is the orthogonal sum H ).The
range of () is thus y flfl)"1which is the orthogonal complement of the
null space of PxPy- Moreover, (P^Py)1 is an oblique projector, because (P^Py)^ =
^(P^Py)^)2 which can be obtained by direct calculation using equality (2.2.3). The
oblique projector () projects on y along .Thus, we have ()=
(PxPy)^ Consequently, we can simplify T = P^Py (PxPy)^ as T = Pxi. (PxPy)^
in Theorem 2.2.14.
To gain geometrical insight into T, in Figure 2.2 we choose an arbitrary unit
vector z. We project z on y along then project on the subspace which is
23
Figure 2.2: Geometrical meaning of 71=3(;f_Py).
interpreted as Tz. The red segment in the graph is the image of T under all unit
vectors. It is straightforward to see that s(l.i=||T|| = tan(6,).
Using Property 2.1.4 and the fact that principal angles are symmetric with respect
to the subspaces A" and the expressions (3; and 3(;f3;) can
also be used in Theorem 2.2.14.
Remark 2.2.16 According to Remark 2.2,4 and Theorem 2.2.1^, we have
Furthermore^ by the geometrical properties of the oblique projector {PxPyY, it follows
that (PyPy)' = = (PyPy)'Px. Thereforewe obtain
3(3)=()=(-)()=-((/^))'
From equalities above, we see that there are several different expressions for 1 m
Theorem 2.2.14, which a/re summarized in Table 2.2.
)=)=(-)().
Similarlywe have
Table 2.2: Different expressions for T.
Px^Py{PxPy)] {PxPy)]
PyPX(PyPXV Py^PyP^Y
Px Py^ (Px^ Py^)t PAP^Py^Y
PyPxi(PyiPx^y Py(Py^P^Y
(PX-Py)(PXPyY (Py PX)(PyPXY
Theorem 2.2.17 Let ^y) < ir/2. Then,
1.If dim(X) < dim(3^)7 we have PxPy{PxPyV = Px(PxPyV = Px-
2. If dim(X) > dim(3^)7 we have {PxPy)^PxPy = {PxPyVPy = Py-
Proof: Let the columns of matrices X and Y form orthonormal bases for the
subspaces X and 3^, correspondingly. Since 0(^,3^) < tt/2, the matrix XHY has
full rank. Using the fact that XHY(XHYy = / for dim(X) < dim(3^) and com-
bining identities (2.2.2) and (2.2.3), we obtain the first statement. Identities (2.2.2)
and (2.2.3)together with the fact that = / for dim($ dim(V)
imply the second statement.
Remark 2.2.18 From Theorem 2.2.17, for the case dim(X) dim(y) we have
Px^PxPy)] = {PxPy)] ~ Px{PxPy)] = {PXPy)] ~ Px-
Since the angles are symmetric, using the second statement in Theorem 2.2.17 we
have (PXPyyPy = (PXPyy Py.
On the other handfor the case dim(X) dim[y) we obtain
{PXPy)]Py. = (PXPyy ~ Py and (P^x )tp^ = (P^Py^ Px^.
25
To sum upthe following formulas for T in Table 2.3 can also he used in Theo-
rem 2.2.14- alternative proof for T = (PxPyY Py is provided by Drmac in [20]
/or f/ze parcwZar case dim(A") = dim(Y).
Table 2.3: Different formulas for T with 0(Y) < tt/2: dim(< dim(3^) (left);
dim( dim(V) (right).
(PxPyV Px (Pp^PyV Py
(- (
Remark 2.2.19 Finally, we note that our choice of the space = Cn may appear
natural to the reader familiar with the matrix theory^ but in fact is somewhat mis-
leading. The principal angles (and the corresponding principal vectors) between the
subspaces X C H and y C H are exactly the same as those between the subspaces
X C X + y and y C X + yi.e.we can reduce the space H to the space X + y C H
without changing PABS.
This reduction changes the definition of the subspaces X1-and y1-and thus of
the matrices Xj_ and Yj_ that span the column spaces X1-and y-1. All our statements
that use the subspaces X1-and y1-or the matrices Xj_ and Yj_ therefore have their
new analogs, if the space X + y substitutes for %.
2.3 Conclusions
In this chapter, we have briefly reviewed the concept and some important prop-
erties of PABS. We have constructed explicit matrices such that their singular values
are equal to the tangents of PABS. Moreover, we form these matrices using several
different approaches, i.e., orthonormal bases for subspaces, non-orthonormal bases for
subspaces, and projectors. We use some of the statements in the following chapters.
26
3. New approaches to compute PABS2
Computing PABS is one of the most important problems in numerical linear
algebra. Let X G Cnxp and Y G Cnxq be full column rank matrices and let X = TZ(X)
and 3^ = TZ(Y). In this chapter, we are interested in the case n p and n q.
Traditional methods for computing principal angles and principal vectors are based
on computations of eigenpairs of the eigenvalue problems. The cosines of PABS are
eigenvalues of the generalized eigenvalue problem (see [9, 30])
O xhy u XHX 0 u
=A
YhX 0 V 0 yhy V
(3.0.1)
The generalized eigenvalue problem above is equivalent to the eigenvalue problems
for a pair of matrices
(XhX)-1/2XhY(YhY)-1YhX(XhX)~1/2u = X2u,
{YHY)-l/2YHX{XHX)-lXHY{YHY)~l/2v = \2v,
which can be found in most multivariate statistics books, e.g., see [39].
Let the columns of matrices X and Y form orthonormal bases for the subspaces
X and 3^, correspondingly. The matrix on the right side of (3.0.1) is identity, since
XHX I and YHY = I. We notice that solving the eigenvalue problem in (3.0.1) is
the same as solving the SVD of XHY. Computing PABS based on the SVD has been
established by Bjorck and Golub [9]. The cosines of principal angles and principal
vectors are obtained from the SVD of XHY^ which is also called the cosine-based
algorithm. Drmac [21] shows this algorithm to be mixed stable and recommends QR
factorizations with the complete pivoting for computing X and Y from original data.
The cosine-based algorithm in principle cannot provide accurate results for small
angles in computer arithmetic, i.e., angles smaller than 10-8 are computed as zero
2Kahan has attracted our attention to the half angles approach and given us his note [42], via
Parlett. We thank Kahan and Parlett for their help.
27
in double precision arithmetic when EPS ~ 10-16. This problem is pointed out and
treated in [9], where a sine-based algorithm for computing the small principal angles
is proposed. The sines of PABS are obtained from the SVD of (/ XXH)Y. For the
same reason, the sine-based algorithm has trouble computing large, i.e., dose to
tt/2, angles in the presence of round-ofT errors.
The sine and cosine based algorithms combined by Knyazev and Argentati [48],
where small and large angles are computed separately, are being used to cure this
problem. They use sine-based algorithm to compute the angles less than a threshold
(e.g., tt/4) and cosine-based algorithm to compute the angles greater than this thresh-
old. Cosine-based angles are obtained from the SVD of XHY and sine-based angles
are obtained from the SVD of (/ XXH)Y. However, such algorithms introduce a
new difficultyaccurate computation of the principal vectors corresponding to PABS
in the neighborhood of the threshold between the small and large angles.
To compute all angles with high accuracy, another approach is by computing the
sines of half-PABS mentioned in [9, 42]. In this chapter, we propose several new al-
gorithms to compute the sines of half-PABS. Our algorithms provide accurate results
for all PABS in one sweep and can be implemented simply and more efficiently, com-
pared to the sine-cosine algorithms. Moreover, we also propose methods to compute
all PABS for large sparse matrices.
Numerical solution of Hermitian generalized eigenvalue problems requires compu-
tation of PABS in general scalar products, specifically, in an A-based scalar product
for a Hermitian positive definite matrix A, which may only be available through a
function performing the product of A times a vector. Our implementation also allows
computation of PABS in general scalar products.
The remainder of this chapter is organized as follows. We first propose a method
to compute the sines and cosines of half-PABS by one SVD of [XF], which is proved
by using properties of Jordan-Wielandt matrix. In Section 3.2, we provide a method
28
to compute sines of half-PABS based on the geometrical insight view of half-PABS.
Both methods for computing PABS developed in Sections 3.1 and 3.2 are based on
the orthonormal bases of a pair of subspaces.
For better computational efficiency, we propose algorithms adopting only the R
factors of QR factorizations without formulating the orthonormal bases of input ma-
trices in Section 3.3. Furthermore, we provide algorithms to compute all principal
angles, not only for dense matrices, but also for sparse-type structure matrices. Gen-
eralizations of these algorithms to compute PABS in the A-based scalar product are
presented in Section 3.4. A MATLAB implementation and results of numerical tests
are included in Sections 3.5 and 3.6. Finally, we give a brief discussion for the new
algorithms and compare these algorithms with the sine-cosine based algorithm.
3.1 Algebraic approaches to compute sin (and cos (
Theorem 3.1.1 Let the columns of matrices X G Cnxp and Y G Cnxg7 where q < p
and p + q
Let L = [XY] and let the SVD of L be IHVHwhere is a diagonal matrix with
diagonal elements in nonincreasing order. Then
/n
cos
and sin)=
where Sq^(L) denotes the vector of the largest q singular values of L in nonincreasing
order and S^(L) denotes the vector of the smallest q singular values of L in nonin-
creasing order. Moreover^ we have
Ur V2U2-Ur
V O V ^
where = / with Uu Ux G Cpxq; VHV = VHV = / with F, F G Cqxq;
and the matrix \U1JJ2} is unitary. Then, the principal vectors of 0^ associated with
this pair of subspaces are given by the columns of
= XUx and Fy = YV.
29
Alternativelythe principal vectors of 0^ associated with this pair of subspaces are
given by the columns of
and Vy = YV.
Proof: Let the SVD of the matrix XHY be UTjVh ^ where U and V are unitary and S
is a diagonal matrix with the real diagonal elements si(XhY)^ ..., sq(XHY) arranged
in nonincreasing order. Due to [4, 35, 82], the eigenvalues of the (p + q) x (p + q)
Hermitian matrix
0 xhy
yhx o
(3.1.1)
are Si(XHy), ...Sg(XHy) together with (p g) zeros. The matrix in (3.1.1)
is sometimes called the Jordan-Wieiandt matrix corresponding to XHY. Since
(XhY)V = UE and (YHX)U = VEH, it follows that
XHYvk = sk(XHY)uk, (3.1.2)
YHXuk = sk(XHY)vk, (3.1.3)
for fc =1.. .^q. Thus, and ~^k
vk vk
are eigenvectors of (3.1.1) corresponding
to the eigenvalues Sk and correspondingly. Further (see [35, p. 418]), we present
the eigenvalue decomposition of matrix in (3.1.1) as
0 xhy
=Q diag (Si,0,0,_i,...Si)Q
YhX O
where Q G C)x ) is a unitary matrix. Let the first q columns of U be denoted
as [/i, i.e., U\ = [^i,w2, ,^g]. Let U = JJ2] and V = bi2,...Let
U\ = Uq-i^ , ^i]and V = [vq^ ...^Vi]. The matrix Q can be written as
Q =
Ux V2U2 -Ui
V O V
30
(3.1.4)
Consequently, the eigenvalues of
0 xhy
yhx o
(3.1.5)
are [1 + si,1 + S2,...,1 + 1,...,1,11sg_i,...,1si] and the columns of
Q in (3.1.4) are the eigenvectors of (3.1.5).
Furthermore, we can rewrite (3.1.5) as follows:
= [XY]H[XY]- (3.1.6)
0 xhy
yhx o
Let the SVD of the matrix [xy] be UYjVh^ where S is a diagonal matrix and the
matrices U and V are unitary. Since cos2(6,/2)=(1 + cos(6,))/2 and sin2(6,/2)=
(1cos(6l))/2, we have
S = diag (a/^cos (0/2)11v^sin (0/2)).
Moreover, V = Q. By Theorem 2.1.2, the principal vectors of 0^ associated with
this pair of subspaces are given by the columns of = XU\ and Vy = YV. Also,
the principal vectors of 0^ associated with this pair of subspaces are given by the
columns of = XU\ and Vy = YV.
Remark 3.1.2 In Theorem we obtain PABS for the case p + q < n. In a
similar way, we can compute PABS for p + q>n. Suppose p + q>n, then
dim( Pi=dim(+ dim(V) dim( U
> dim(+ dim(V) n
=p + q n.
Thereforethere are at least p + q n zero angles between the subspaces X and y.
Furthermorewe obtain the sines and cosines of half-PABS by the singular values of
31
L as follows:
cos
and
=[5^_)0,0].
There are p + q n zeros after S^^L) in 1 /^[S^^L)) We also note that
the principal vectors are not unique if Sk(L) = s^+i(L).
Theoretically speaking, PABS can be obtained from the eigenvalues and eigen-
vectors of equality (3.1.5). However, this result does not appear to be of practical
interest in computing PABS. Calculating the sine or cosine squared gives inaccu-
rate results for small angles in exact arithmetic. Moreover, we should avoid using
XHY We can achieve a better understanding of this from the following example.
Let x =[1 and y =[1le1]^* They are both normalized in double-precision
and equality (3.1.5) results in a zero angle no matter what, since xHy =1.
Figure 3.1: Sines and cosines of half-angles.
To illustrate Theorem 3.1.1, we again take x =[l ]^ and y =[lle1]^ as
1 1
an example. We have L = . Computing the singular values of L, we get
0 1e-10
sin(6,/2) = 5e 11 and cos(6,/2) = 0. Obviously, we lose accuracy in cos(6,/2) in
this case. As shown in Figure 3.1, cos(6,/2) loses accuracy for small angles in double-
32
precision. Since we always have |sin/(6,/2)| > |cos/(6,/2)| for 0 G [0,tt/2], the better
choice is to use sin(6,/2) to calculate PABS.
Combining Theorem 3.1.1 and Remark 3.1.2, we formulate an algorithm to com-
pute all PABS. This algorithm provides accurate computation for small and large
angles. The Algorithm 3.1 is based on computing sin(6,/2) which is obtained from the
smallest min(p, q) singular values of L. Moreover, the principal vectors are obtained
from the right singular vectors of L corresponding to the smallest min(p, q) singular
values for the case p + q
case p + q>n.
Table 3.1: Algorithm 3.1.
Algorithm 3.1
Input: matrices X\ and Y\ with the same number of rows.
1. Compute orthonormal bases X e CnXf} and e Cnxg of 7^(Xi) and 7^(Yi)correspondingly.
2. Set L = [Xy], compute the SVD of L: =svd(L, 0).
3. If p -\- q < n, take the smallest m = min (p, q) singular values of L denoted as si,...,sm.
If p -\- q > n, take the smallest n max(p, q) singular values of L and add p -\- q n zeros
singular values and denote as si,...,sm.
4. Compute the principal angles in nonincreasing order for A:=1,...,m:
k = 2 arcsin (sk/V2).
5. Compute corresponding principal coefficients:
Ui = ~^V{1 :p,p-\-q m + 1:p -\- q) and Vi=\/2V (p -\- l :p-\-q:p-\-q m-\-l:p -\- q).
6. Compute the principal vectors:
Vx = XUx and ^ = ^1.
Output: the principal angles ...:m between the column-space of matrices X\ and Yi,the
corresponding principal vectors Vx and and the principal coefficients 171 and Vi
Remark 3.1.3 In the first step of Algorithm 3.17 we use the MATLAB function
ORTHUPDATE.rn (see Appendix A) to compute the orthonormal bases of the input
matrices. The function ORTHUPDATE.rn is backward compatible with the built-in
33
MATLAB function ORTH.m. Our function ORTHUPDATE.m provides three choices
to compute an orthonormal basis with option opts.fast being 0,1 and 2. The or-
thogonalization is performed using the SVD with the option opts.fast being 0, which
performs the same as ORTH.m. With opts.fast =17 the function ORTHUPDATE.m
performs QR factorization with pivoting. The function ORTHUPDATE.m only works
for a full-column rank matrix if opts.fast = 2. The algorithm is developed as follows:
suppose Xx is an input matrix, then we compute an upper triangular matrix R from
a Cholesky factorizationsuch that R = cho\{XfXi)then we take the orthonormal
basis as XiR~l.
Let us consider the cost of these three algorithms as mentioned above to compute
the orthonormal basis of the input matrix. In generalthere are two steps to com-
pute the SVD, e.g.7 see [8,12,19]. The first step can he done using Householder
reflections to reduce Xi to a bidiagonal matrix. The second step reduces the bidi-
agonal matrix form to diagonal form iteratively by a variant of the QR algorithm.
According to [8, p.90]7 [12]7 and [19, Chap. 11]7 the total cost of the complete SVD is
about (3 + a)np2 + ll/3p3 flops7 where a = 4 if standard givens transformations are
used in QR algorithm^ a = 2 if fast given transformations are used. For the detailed
transformations7 the reader is referred to [8, Section 2.6].
Generally^ the cost of QR factorization based on Householder reflections (see [77])
is about 2np2 2/3ps flops for an n hyp matrix with n>p. The Cholesky factorization
for the matrix X(IX1 is about l/3ps flops. Among these three choices for computing
the orthonormal basis, the opts.fast being 2 runs fastestbut is the least reliable. The
most robust and slowest one is based on the SVD-based algorithmsince the cost
of SVD is about two times the cost of QR. For detailed information concerning the
stability of these three factorizations7 see [77].
In our actual code, we use the economy-size factorization QR and SVD in the
function ORTHUPDATE.m. We do not know the exact number of flops in MATLAB.
34
In [57] 7 it states that floating-point operations are no longer the dominant factor in
execution speed with modern computer architectures. Memory references and cache
usage are the most important.
Remark 3.1.4 Golub and Zha [30] point out that PABS are sensitive to perturbation
for ill-conditioned Xx and Y A proper column scaling of Xx and Yi can improve
condition numbers of both matrices and reduce the error for PABS, since column
scaling does not affect PABS. Numerical tests by Knyazev and Argentati [48] show that
proper column scaling improves accuracy when the orthonormal bases are computed by
using the SVD. However, there is no need to scale the columns when the orthonormal
bases are obtained by the QR factorization with complete pivoting^ e.g.7 see [21,48].
Also, it is pointed out in [21] that one good way to compute the orthogonal bases
is to use the unit lower trapezoidal matrices Lx and Ly from the LU factorizations
of Xi and Y\ with pivoting^ since TZ(Xi) = TZ{Lx) ctnd Tl(Yi) = TZ(Ly).
Remark 3.1.5 We note that the matrix size of L in Algorithm 3.1 is n by p + q. If
n is large, in order to reduce the cost of the SVD of L, we can reduce the matrix L
to a (p + q) by (p + q) triangular matrix R by QR factorizationsuch that L = QR.
We can obtain PABS by the SVD of R.
In the following, we will derive a different formula to compute the sines and
cosines of half-PABS.
Lemma 3.1.6 Let the maznces X and Y he described as in Theorem 3.1.1. Lei the
columns of matrix Qx^y form an orthonormal basis for the column-space of L, where
L={XY}. LetL = Q+yL. Then
where extra Os may need to he added on the left side to match the size. In particular,
35
if we take Qx-\-y = [X X]7 then
, I XhY
L = .
OXfY
Proof: By direct computation, we have
LHL = LHQx+yQHx+yL
=LH Px+yL
= lhl,
where Px-\-y is an orthogonal projector on the column-space of L. So, the positive
singular values of L coincide with the positive singular values of L. If we take Qx-\-y =
[X X_l] then
^ I XhY
l=[xx]h[xy}=
OXfY
as expected.
Remark 3.1.7 Lei the matrix Xi G Cnxp have full column rank. Let Yi G Cnxq and
let [Q1Q2] be an orthonormal matrix obtained from the reduced QR factorization of
We take X = Qi7 thus we have
, I XhY
L = .
OQ^Y
The size of L is small, since max^size(L)) is p + q. However, this L is not attractive
to be used to compute PABS in practice. In some situations, we have no information
about the rank of the input matrix XiSuppose we use the QR factorization to obtain
the full-column rank matrix \ from the matrix Xlt In this method, to get PABS we
have to do the QR factorizations for X!Y![ iYi] and plus one SVD of .The
computational cost is similar to the method described in Remark 3.1.5. Moreover,
applying the orthonormal matrix [Q\ Q2}, which is obtained from the QR factorization
36
[ may increase the inaccuracy for small angles. Especially ij [X\ Y\] is ill-
conditioned, we may obtain [Q\ Q2] inaccurately.
Let us highlight that updating the SVD of L by appending a row or column and
deleting a row or column is well investigated, e.g., see [8, Section 3.4]. In the last
lemma of this section, we discuss briefly the relationship between the sines and the
cosines of half of the principal angles between TZ{Xi) and ^(Vi) and those of TZ{Xi)
and w]), where Xi G Cnxp and Yi G Cnxq. For convenience, let p + q < n and
q < p.
Lemma 3.1.8 Let the principal angles between TZ(Xi) and TZ(Yi) with Xi G Cnxp
and Yi G Cnxg7 where p + q < n and q < p, fee 6*i,...,6*g arranged in nondecreasing
order. Let the principal angles between TZ(Xi) and TZdYiv]) be §1arranged
in nondecreasing order. Then
( n Vhl
M 2
COS I > cos > cos
>
> cos
> cos
>
71
71
>sm >S1
sm
~ > > sin { $ 1 > sin (4) > sin ,1
Proof: Let the columns of matrices X and Y form orthonormal bases for the sub-
spaces TZ{Xi) and correspondingly. Let the columns of matrix [1 u] form the
orthonormal basis for ([1^ w]). Moreover, let L = [XY] and Z By Theo-
rem 3.1.1we have S^(L) = v^[cs(-(9i)...cos(|11sin(-6^)...sinQi^)]
and (2) = W cos ,cos ,1,1,sin )...sin ().
Applying the interlace property of the singular values [28, p. 449], the results are
obtained.
3.2 Geometric approaches to compute sin
Let x and y be unit vectors. The HausdorfT metric between vectors x and y is
defined as
2sin -6(x,y) = M{\\xq-y\\ :qeC,\q\ =1}.
(3.2.1)
37
In particular, if x and y in Rn are such that xHy > 0 (see Figure 3.2), then
2sin (^e(x,y)^j = \\x y\\.
Similarly, the HausdorfT metric between two fc-dimensional subspaces X and 3^, e.g.,
Figure 3.2: Sine of half-angle between x and y in Wn.
see [66], is defined as
$ (2sin (())...2Sin ()))=inf{|||X0-y||| }
(3.2.2)
where U{k) is the family of unitary matrices of size k and the columns of matrices
X and Y form orthonormal bases for the subspaces X and 3^, correspondingly. The
function $() is a symmetric gauge function [5] and ||| ||| denotes the corresponding
unitarily invariant norm [55]. See [66] for further information concerning the sines
of half-PABS related to the HausdorfT metric between two subspaces. In particu-
lar, taking the same minimization problem for Frobenius norm in (3.2.2), we have
(see [73])
2 sin ( ^0(^,3^)
min \\XQ-Y\\f
Qeu(k)
F
38
This is a special case of the orthogonal Procrustes problem. The solution Q of the
Procrustes problem can be obtained by using the orthogonal polar factor of XHY [28]:
let the SVD of XHY be UT.VH ^ then the optimal Q is given by Q = UVH. We use
this construction of Q to demonstrate that S(XQ Y) = 2 sin ) 3^)/2), which is
mentioned in Kahan s lecture note [42].
Theorem 3.2.1 [42] Let the columns of matrices X G Cnxp and Y G Cnxq (q < p)
form orthonormal bases for the subspaces X and y, correspondingly. Let L = XQ
Y and B = XHY, where QHQ = / and Q is the unitary factor from the polar
decomposition of B. Then
sin
(3.2.3)
Proof: Let B = U [C 0]H VH be the SVD of 5, where U and V are unitary
matrices. The matrix ¢7 is a diagonal matrix with cos (6k) on its diagonal, where
9k for k =1,...are the principal angles between the subspaces X and 3^. Let
Q = U[10}H VH. We have
LhL = (XQ-Y)h(XQ-Y)
= 21- QhXhY YhXQ
=21 2VCVh
= 2V(I-C)Vh,
where / ¢7 is a diagonal matrix with 1cos (6k) for fc =1,...,g on its diagonal.
Note that
qHxhy = v[i]UHU
0
VH = VCVH.
Therefore, the singular values of L are y 2(1cos k) for k =
sin2 (9/2)=(1cos6l)/2, we obtain sin (0^/2) = S^(L)/2.
1...g. Since
39
Remark 3.2.2 We can rewrite L as XU[j ]H YV. Moreover, if we use the
reduced SVD of such that B = UrCVrH with Ur G Cpxq and Vr G Cgxg, then L can
he written as
XUrVrH-Y or XUr-YVr.
The formulation XUr YVr to compute the sines of half-PABS is also mentioned in
[9]7 but no detailed information provided.
Remark 3.2.3 From the proof of Theorem 3.2.17 we also can get
cos (0/2) = S(XQ + Y)/2.
Moreoverif we use the reduced SVD of Bsuch that B = UrCVrHwe have
cos (0/2) = S(XUr + YVr)/2-
Theoretically speaking, the principal vectors associated with the pair of the sub-
spaces X and 3^ are the columns of the matrices XUr and YVr) correspondingly, where
Ur and Vr are obtained from the reduced SVD of B. However, the cosine-based al-
gorithm fails to accurately compute the small principal angles and the corresponding
principal vectors in double precision, which is pointed out in [48]. For example, we
have small angles 10-9,10-10,10-11, and 10-12, which will produce multiple singular
values 1 with multiplicity 4 in double precision based on the cosine-based algorithm.
In fact, there are four small distinct angles with exact arithmetic. This implies that
the principal vectors corresponding to small principal angles may not be computed
accurately in double precision using the SVD of B.
To obtain the principal vectors corresponding to small angles in high accuracy,
one way is by using the SVD of L in Theorem 3.2.1. Let SVD of L be UTjVh. The
principal vectors on the subspace 3^ are the columns of Vy^ where Vy = YV and the
principal vectors Vx on the subspace X can be obtained by Vx = XQV.
40
The formulation for principal angles on the subspaces X may not be obvious. In
our actual code, we reformulate L in Theorem 3.2.1 as XUr YVr in Remark 3.2.2, an
alternative approach to get the principal vectors associated with all angles is by the
SVD of XUr YVr. Let the SVD of XUr YVr be Ui^iV^1 ^ then the principal vectors
associated with the pair of two subspaces are the columns of XUrVi and YVrV\. Our
numerical tests support this result.
We formulate Algorithm 3.2 to computing all principal angles and principal vec-
tors by using L = XUr YVr as in remark 3.2.2.
Table 3.2: Algorithm 3.2.
Algorithm 3.2
Input: matrices X\ and Y\ with the same number of rows.
1. Compute orthonormal bases X and Y of column spaces of X\ and Yi,correspondingly.
2. Compute the SVD of B ^ X^Y: | ^ ^ ^ ^ )(
3. Let L = XUr YVr and compute the singular values of L for A:=1,...,min (p, q).
4. Compute the principal angles: k = 2arcsin (s^/2).
5. Compute the right singular vectors Vi of L, then the principal coefficients are
Ui = UrVi and V1 = VrV1.
6. Compute the principal vectors:
= XUl and Vy = YVl
Output: the principal angles :min(p,q) between the column-space of X\ and Y\ matrices,
the corresponding principal vectors Vx and and the principal coefficients Ui and Vi.
41
3.3 Computing sin (and cos (without using orthonormal bases
Our algorithms described above and the cosine-sine based algorithm compute
PABS accurately by using the orthonormal bases for subspaces. If the input matri-
ces are large sparse, then computing an orthogonal basis for each of the subspaces
may destroy sparsity of the given spaces. The cosine-sine based algorithm coded by
Knyazev and Argentati is not capable of handling the sparse-type structure matrices
available in MATLAB. In this section, we propose algorithms to compute the sines
of half-PABS by only using the upper triangular matrices R from QR factorizations
without forming the orthonormal bases or input matrices Xi and Y\. Moreover, we
provide an algorithm to compute PABS not only for dense matrices but also for sparse
matrices.
Theorem 3.3.1 Let Xi G Cnxp andYi G Cnxq he matrices of full rank where p + q <
n and q < p. The principal angles between TZ{Xi) and TZ(Yi) can be computed as
follows.
1.Compute the triangular part R of the reduced QR factorization of the matrix
L\ = such that
where Rn G Cpxp7 R12 G Cpxq7 and R22 G Cqxq.
2. Compute the triangular part Ry of the reduced QR factorization of the matrix Y
Rll Rl2
3. Let L
.We have
R22Ryl
cos
42
Proof: Let the reduced QR factorization of L\ be
[x1y1] = [g1Q2]
Ru Rl2
0 R22
(3.3.1)
where the columns of the matrix [QiQ2] G Cx(p+9) are orthonormal. Thus, the
matrix Xi can be written as Xi = QiRn. Let the reduced QR factorization of Yx be
QyRy. Therefore,
[-^1 ^i]=[Qi Qy]
Combining (3.3.1) and (3.3.2), we obtain
[Qi Qy] = [Qi Q2]
Rn
Ry
(3.3.2)
Rn Rl2
O R22
Rh1
Ry1
(3.3.3)
[Qi Q2]
(3.3.4)
/ RuRy1
R22Ryl
Since the singular values of a matrix are invariant under orthogonal transformations,
we have
I R^Ry1
S
[Qi Qy^j
s
O R22Ryl
According to Theorem 3.1.1, we obtain the result.
Remark 3.3.2 Let the SVD of L in Theorem 3.3.1 be UTjVh and let
u = Vw{\ :p^q + 1:p + q) and Vi = V2V(p +1:p + q^q + 1:p + q).
Then the principal vectors of 0^ associated with this pair of subspaces are given by
the columns ofVx = XiR^U andVy = Y\RylVi^ correspondingly.
Remark 3.3.3 Note that the matrix size of L in Theorem 3.3.1 is (p + q) by ip + q)-
The cost of the SVD of L in Theorem 3.3.1 should he less than the cost of the SVD
43
of L in Theorem 3.1.1. The method discussed m Theorem 3.3.1 may he an efficient
alternative to compute PABS.
Remark 3.3.4 The R factor is easy to update whenever a column or row is appended
to the matrix L\, see [8, Section 3.2].
Theorem 3.3.5 Under the assumptions of Theorem 3.3.1, we have
cos(e^) = S^RnRy1)
sin(0'1-) = Si(R22Ry1)-
Proof: From (3.3.1) and (3.3.2), we obtain
= Q1R12 + Q2R22 QyRy-
By multiplying Ry1 on both sides of above equality, we get
QiR^Ry1 + Q2-R22-R31=Qy-
By multiplying on both sides of (3.3.7), we have
^12^' = QiQy-
Note that Q1Q2 = * From Theorem 2.1.2 of PABS, we obtain equality (3.3.5).
Similarly, by multiplying on both sides of (3.3.7), we have
R22Ryl = Q^Qy-
Since TZ([XiYi\) C TI([QiQ2])j from Property 2.1.3 we get equality (3.3.6).
(3.3.5)
(3.3.6)
(3.3.7)
Remark 3.3.6 In Theorems 3.3.1 and 3.3.5, the matrix Ry is obtained from the QR
factorization of Ylt In fact, we could get Ry from the QR factorization of
R
22
Let the reduced QR factorization
R12
R22
be QyRy. Then,
= [Qi Q2]Qy^y-
44
Moreover, let B = [QiQ2\Qy- We have BHB = I. HenceAy is the triangular
matrix of the QR factorization of Ylt An alternative method to compute the cosines
of PABS using LQ factorization can he found in [17].
Remark 3.3.7 In the Householder QR factorization, we apply a succession of ele-
mentary unitary matrices Qi on the left of the initial matrix = Li=so
that the upper-triangular matrix R is obtained. Let Qp denote the pth unitary matrix.
The matrix = L\ after p steps is
QpQpi QiL\
0 L?
where [Rn R12] contains the first p rows of the final upper-triangular matrix R. Let
L =
I RuRy1
0 L^Ry1
where Ry is the upper-triangular matrix obtained from the QR
Rn
factorization ojY\ or
Then we have
.cos ()= and ()=(
2. cos(0^) = S^(RuRy1) and sin(0^) =
To summarize, we formulate Algorithm 3.3 to compute all PABS with dense or
sparse matrices. This algorithm provides accurate computation for small and large
angles.
45
Table 3.3: Algorithm 3.3.
R-
4. Compute L
Algorithm 3.3
Input: matrices X\ G Cnxp and Y\ G Cnxq with full column rank .
1. Compute the triangular part R of the QR factorization of L\ = [X\ Yi], e.g.,
qr(Li,0) if Xi and Fi are sparse,
triu(qr(Li,0)) otherwise.
2. Compute R\2 = R(1:p:p -\-1:p -\- q) and R22 = R(p + 1end:p +1:p -\- q).
3. Compute the triangular part Ry of the QR factorization of
_Rti
Speye(rt Ru/Ry lfx, and y, are sparse,
sparse(a p:p) 11221
eye(p) 12/%
zeros(a-p,p) R^^/Ry
where a = min(size(Li)).
5. If p -\- q < n, compute the smallest m = min(p, q) singular values si,...,sm of L and
corresponding m right singular vectors V. The principal angles are computed as
=2 arcsin (Sfc/V^) for =1...m.
6. If p -\- q > n, compute the smallest n max(p, q) singular values of LH and corresponding
n max(p, q) right-singular vectors V\. Moreover, there are p -\- q n zeros singular values
and the p -\- q n right-singular vectors V2 corresponding to the largest p -\- q n singular
values are computed. Let V =[V\ V^]* principal angles are computed as
otherwise,
2 arcsin (sk/V2) for =1... n max(p, q).
0
for =n 7 1...m.
7. Compute the principal vectors:
lx = V2X1/R(1 :p,l:p)V(l p, ) and V3 = ^Y1/RyV(p-\-1+ ).
Output: the principal angles i:...:m and the corresponding principal vectors Vx and Vy.
Remark 3.3.8 In the first step of Algorithm 3.3, we apply the MATLAB function
qr to compute the upper triangular matrix for both the dense-type and sparse-type
matrix Lx. SuiteSparseQR developed by Davis [16] in MATLAB R2009has re-
placed the prior sparse QR factorization in MATLAB provided by Gilbert [26]7 where
46
SuiteSparseQR is a multithreaded rriultifrontal sparse QR factorization method.
In step 3, for dense matrices Xi andYi it is cheaper to compute the triangular part
R12
Ry from the QR factorization of
R
22
than the QR factorization of Y\. However,
factorizing a sparse matrix implies fill-in in the factors. Thus,
R
12
R
22
may not he as
sparse as Y\. So, in step 3, we can compute Ry from the sparse matrix Y\. A good
column ordering can help limit the worst-case fill-in. A suitable column ordering of
A is usually obtained by the minimal degree analysis on AHA. For a more detailed
account of finding a good column orderingthe reader is referred to [33].
Remark 3.3.9 In step 57 if we compute a few principal angles for the large sparse
matrix L, it is better to use the function svds(L) than svd(full(L)). However, to com-
pute all singular values svd(-) will usually perform better than svds(-) in MATLAB.
See SVDS MATLAB function reference for detailed information. In our actual code,
we use svd(full(L)) instead of svds(L) to compute all PABS of sparse matrices.
Let us finally present new simple proofs for some properties of orthogonal pro-
jectors on the subspaces X and 3^.
Corollary 3.3.10 [23] Lei Px and Py he orthogonal projectors on the subspaces X
and ycorrespondingly. We have
+ A)=1 + CS (eJ,7T/2)(t^))1-CS (eJ,7T/2)(t^))
where 5,()i)U(i)2) denotes the vector of singular values in (0,1)U (1,2).
Proof: By direct calculation, we have
XH
Px + Py = [XY]
Yh
47
Let the SVD of [xy] be UEVH. Then
Px + Py = UmHfjH.
Applying Theorem 3.1.1, the result is obtained directly.
In the following corollary, we will present the sines and cosines of half-PABS
which are relative to these orthogonal projectors.
Corollary 3.3.11 Let Px and Py he orthogonal projectors on the subspaces X and
y, correspondingly. Then,
,w)(= W (eW2)(u))(>2)(^)).
Proof: We have
PX + Py=[PXPy][PXPy]H.
Thus, the singular values of [p^ Py] are the square root of the eigenvalues of Px + Py-
Since the eigenvalues of Px + Py are the same as the singular values of Px + Py^ using
Corollary 3.3.10 completes the proof.
Remark 3.3.12 Note that we can write [px py] = [xy]
XH 0
0 Yh
.From this
representation^ it is clear that the principal vectors can be obtained directly from the
right singular vectors of [p^ Py].
48
3.4 Computing sin (and cos (in the A-based scalar product
Numerical solution of Hermitian generalized eigenvalue problems requires compu-
tation of PABS in general scalar products, specifically, in an A-based scalar product
for a Hermitian positive definite (HPD) matrix A. In this section, we will develop
methods to compute PABS m the A-based scalar product based on sines of half-PABS
in the A-based scalar product. To proceed, we need the definitions of the A-based
scalar product [69] and PABS m the A-based scalar product.
For an HPD matrix A, let
(*, -)A = (A-, -k denote the scalar product induced by A.
|| |U = y (A*,*)2 = \\A1/2 ||2 denote the corresponding vector norm induced
by A
|| m = || A1/2 A-1/21|2 denote the corresponding matrix norm induced by A.
The following denmtion of PABS m the A-based scalar product, consistent with the
definition of PAB^ m Section 2.1, is proved by Knyazev and Argentati in [48].
Definition 3.4.1 Let C Cn 6e 6aces p = dim ()and g = dim (Y).
Let m = min (p, q). The principal angles
QA{X,y) = [Al)- ,^m], where 0Ak G [,tt/2], fc =1,...,m,
between X and y in the A-based scalar product are recursively defined by
sk = cos(0Ak) = max max | (x,y)A \ = \ (xk,yk)A \ = \x^Ayk\,
subject to
\\^\\a = WvWa =1,xHAxi = 0, yHAyi = 0, z =1,...,fc -1.
The vectors {xm} and {yiym} are called principal vectors relative to the
A-based scalar product.
49
Theorem 3.4.2 [48] Let the columns of matrices X G Cnxp and Y G Cnxq form
A-orthonormal bases for the subspaces X and y, correspondingly. Let the SVD of
the matrix XHAY he UTjVh, where U and V are unitary matrices and Tj is a p
by q diagonal matrix with the real diagonal elements si(XHAY)..sm(XHAY) in
nonincreasing order with m = min(p,g). Then
cos Q\(X,y) = Sl(XHAY) = [Sl(XHAY),..., sm(XHAY)],
where 0^ is the vector of principal angles in the A-based scalar product between X
and y arranged in nondecreasing order. Moreover, the principal vectors relative to
the A-based scalar product associated with this pair of subspaces are given by the first
m columns ofVx = XU and Vy = YV, correspondingly.
Now we derive theorems about sines and cosines of half-PABS in the A-based
scalar product. The following theorem is the extension of Theorem 3.1.1.
Theorem 3.4.3 In the notation of Theorem 3.4-2, let q < p and p + q < n. Let
L = Al/2[XY] and let the SVD of L be UTyH^ where ^ is a diagonal matrix with
diagonal elements in nonincreasing order. Then
cos
(3.4.1)
Moreover% we have
lh V2U2 -th
V O V '
where U^UX = U^1U1=I with U^lh e Cpxq; VHV = VHV = I with V,V e Cqxq;
the matrix [jj1 JJ2] is unitary. Then, the principal vectors in the A-based scalar product
of associated with this pair of subspaces X and y are given by the columns of
= XUx and VS = in/.
50
Alternatively, the principal vectors in the A-based scalar product f\ associated with
the pair of subspaces are given by the columns of
= XUx and
Proof: By the assumption that the columns of matrices X G Cnxp and Y G Cnxq
are A-orthonormal bases for the subspaces X and 3^, correspondingly, it follows that
XHAX = / and YHAY = L Let SVD of the matrix XHAY be UY,VH, where
S = diag(Si, ...) with si >...>% and s=cos 6^ (Y) for fc =1...g.
Using a similar process as in the proof of Theorem 3.1.1, we have the eigenvalues of
u 0 xhay
[XY]hA[XY] = I +
yhax o
(3.4.2)
are [1 + Si,...,1 + 1,...,1,1,1Si] and the corresponding eigenvectors
are the columns of
Q =
IA V2U2 -Ui
V O V
(3.4.3)
where U\ is the first q columns of [/, such that U\ = [^i,* * ^uq]. Moreover, U =
[U\ U2]. Let V = [^i,^2? * then V = [vq)vq_i)... ,^i] and U\ = [uq)uq_i)... ,^i].
Since the eigenvalues of (3.4.2) are equal to the square of the singular values of
L = A1^2[XY]J we obtain (3.4.1). Moreover, let the SVD of L be U'EVH. It is easy
to obtain V = Q. Based on the definition of PABS in the A-based scalar product via
SVD, we have that the principal vectors of 0^ in the A-based scalar product are the
columns of XUi and YV. Similarly, the principal vectors of 0^ in the A-based scalar
product are the columns { XU\ and YV.
Higham and Al-Mohy [34] give an algorithm to compute the square root of A.
It costs about 15|n3 flops for an n by n Hermitian positive definite matrix A. For
computational reasons, when the size of A is large, we need to avoid dealing with the
51
square root of A explicitly. From the proof of Theorem 3.4.3, let L\ = [Xy]^A[Xy]
and p + q < n then
cs2 ()=an(i 2 ()=3(li). (3.4.4)
However, using the SVD of the matrix [XY]HA[XY] may not produce small angles
accurately. Because we need to take the square root of the singular values of Li to
obtain the sines or cosines of half-PABS. In other words, the PABS smaller than 10-8
will be zero in double precision.
One way to cure the problem is to replace L in Theorem 3.4.3 by L = R[XY]
or L = K[XY] where A = RHR (Cholesky factorization) or A is known as A =
KHK. For this case, we need to know A exactly or give A as KHK. However,
sometimes A may not be available as a matrix but only as some function performing
the multiplication of A by a given vector. We can update L by using the following
lemma [48, Lemma 6.1].
Lemma 3.4.4 The SVD of the matrix AX^2L coincides with the SVD of the matrix
Q^AL, where Ql is a matrix with A-orthonormal columns, which span the same
column-space as columns of matrix L.
In Theorem 3.2.1, we present a formula of the sines of half-PABS. The following
theorem is an extension of Theorem 3.2.1.
Theorem 3.4.5 In the notation of Theorem 3.4-2, let L = Al/2[XUi YV], where
U\ is the first q columns of U. Let q
Sin t)=(). 3.4.5)
Proof: Prom Theorem 3.2.1, we have XHAY UTjVh. Since q < we have
XHAY = U\CVH^ where ¢7 is a diagonal matrix with cos (Ak) n its diagonal, and
Ak for fc =1,...,g are the principal angles between X and 3^ in the A-based scalar
52
product. It is worth noting that U\CVH is the thin SVD of XHAY. Then
LhL = {XUx YV)HA{XUl YV)
=U^XhAXUx U^XhAYV vhyhaxu1 + vhyhayv
=21 -2C,
where / ¢7 is a diagonal matrix with 1cos (Ak) for fc =1,...,g on its diagonal.
Since the singular values of L are the square root of the eigenvalues of LHthe proof
is completed.
By analogy with Algorithms 3.1 and 3.2, we can obtain Algorithms 3.4 and 3.5
for computing the principal angles in the A-based scalar product according to The-
orems 3.4.3 and 3.4.5. In Algorithms 3.4 and 3.5, we use the function ORTHUP-
DATE.m (see Appendix A) with option opts.A = A to compute an A-orthogonal
basis, which is compatible with the function ORTH A.m presented in [48].
Table 3.4: Algorithm 3.4.
Algorithm 3.4
Input: matrices X\ and Y\ with the same number of rows.
1. Compute A-orthonormal bases X G Cnxp of 7^(Xi) and Y e Cnxq of 1Z{Yi).
2. Compute L\ = [XY] and Qlx = orthupdate(Li,opts.A).
3. Compute the SVD of L = Q^iAL\, such that = svd(L, 0).
4. Take the smallest m {p ~\~ q j) singular values of L with j = size(QiJ1,2) and add
P Q ~ j zeros denoted as si,...,sm with m = min(p, q).
5. Compute the principal angles in the A-based scalar product:
k = 2 arcsin (s^/a/2) for A:=1,...,m.
{Ui = \/2V(1:p:p -\- q m -\-1:p -\- q)
Vx = ^/2V (p~\~l :p-\-q,p-\-q-m-\-l:p-\-q)
7. Compute the principal vectors in the A-based scalar product: Vx = XUi and
Output: the principal angles ...:m between column-space of matrices X\ and Y\ in the
A-based scalar product, and the corresponding principal vectors Vx and
53
Table 3.5: Algorithm 3.5.
Algorithm 3.5
Input: matrices X\ and Y\ with the same number of rows.
1. Compute A-orthonormal bases X and Y of IZ(Xi) and 7^(Yi), correspondingly.
2. Compute the SVD of B =| ^ =Svd(^ rank(X) > rank(y),
3. Compute L\ = XUr YVr and Qlx = orthupdate(Li,opts.A).
4. Compute SVD of L = Q^iAL\.
5. Take the singular values of L for A:=1,...,m, where m = min(size(L)). Then
the principal angles in the A-based scalar product are
{2 arcsin (s^/2) for /c =1,...,m,
0 for /c = m + 1,...,min(p, q).
6. Take the right singular vectors Vi of then the principal vectors in the A-based
scalar product associated with the pair of subspaces are the columns of
Vx = XU^ and =IT%.
Output: the principal angles :min(p,q) between column-space of matrices X\ and Y\
in the A-based scalar product, and the corresponding principal vectors Vx and Vy.
Remark 3.4.6 The first and second steps of Algorithm 3.5 can be replaced by
the Cholesky factorization. Let the matrices Xi and Y\ have full column rank.
Let the Cholesky factorization of X^AXi and Y^AYi he R^Rx RyRy^ cor-
respondingly, where Rx and Ry are upper triangular matrices. Thenwe have
B = (R^)~1X^1 AYi(Ry)~1 ^ which implies that cos^^,^) = S^(B).
If the input matrices are ill-conditioned, the A-orthonormal bases may be com-
puted inaccurately. Next, we develop an algorithm to compute PAB^ m the A-based
scalar product without forming A-orthonormal bases. To proceed, we use two lem-
mas. The following lemma links the PAB^ m the A-based scalar product with the
PAB^ m the standard scalar product, e.g., see [48].
Lemma 3.4.7 [48] Let A = KHK for some matrix K. The principal angles between
54
subspaces X and y relative to the scalar product coincide with the principal
angles between subspaces KX and Ky relative to the standard scalar product (*, ).
The following lemma links A-orthogonal QR factorization with QR factorization
in the standard product.
Lemma 3.4.8 Let A-orthogonal QR factorization of S he QaR, where Q^AQa = I
and R is an upper triangular matrix. Then the QR factorization of A1/2S is QR vnth
respect to the standard inner product with Q = A1/2Qa-
Proof: The proof is straightforward, since Q^AQa = (A1/2Qa)h(A1/2Qa) = /.
Remark 3.4.9 Alternative ways to compute PABS in the A-based scalar product are
by using the SVD of the triangular part R obtained from A-orthogonal QR factoriza-
tions of [XY] and [XU! YV] in Theorems 3.4.3 and 3.4.5.
Theorem 3.4.10 Let Xi G Cnxp and Yi G Cnxq he matrices of full rank where
p + q < n and q
A-based scalar product can be computed as follows.
1. Compute the triangular part R of the reduced A-orthogonal QR factorization of
the matrix L\ = [Xi Yi\, such that
Rn R12
R =
0 i?22
where Rn G Cpxp, R12 G Cpxq, and R22 e Cqxq.
2. Compute the triangular part Ry of the reduced A-orthogonal QR factorization
of the matrix Y
55
3. Let L
I RnRy
Then
R22Ry1
cos(0^) = S^(Ri2Ry1) and sin(0^) = S^(R22Ry1)-
Proof: Using Lemma 3.4.7, we take K = A1/2. Computing the principal angles
between the subspaces X and 3^ in the A-based scalar product is equivalent to com-
puting the principal angles between the subspaces Al/2X and Al/2y in the standard
scalar product.
Therefore, to compute PABS in the A-based scalar product, we can replace L\
with L\ = Al^2[Xi Yi\ in the first statement of Theorem 3.3.1 and replace Y\ with
Al/2Y\ in the second statement of Theorem 3.3.1. In other words, we can obtain trian-
gular matrices R and Ry by the reduced QR factorizations of the matrix A1^2[x1 Y\]
and A1/2!7!, correspondingly. Using Lemma 3.4.8, we obtain the results as stated.
Remark 3.4.11 In particular, if [Xi Y\] has full column rank, we can obtain R and
Ry by the Cholesky factorizations of matrices [XiYi\h A[XiYi\ andY^AYi, such
that [x: Y1]hA[x1 Yx] = RHR and Y^AYt = R§Ry.
Remark 3.4.12 One approach to constructing A-orthogonal QR factorization of a
matrix is to use the modified Gram-Schmidt process with respect to the A-based scalar
product. See the algorithm below. The MATLAB code gmsa.m to compute the trian-
gular part R by the A-orthogonal QR factorization is provided in Appendix A.
According to Theorem 3.4.10, we formulate Algorithm 3.6 to compute all PABS
in the A-based scalar product by computing the triangular upper matrices from A-
orthogonal QR factorizations.
56
Algorithm: modified Gram-Schmidt with respect to the A-based scalar product
Input: matrix V = . ? ^n]
for i =1:n Rii ~ =
Qi = i j
for j =1 n
j =qfAvj
V3 = Rijqi
end
end
Output: triangular matrix R.
Table 3.6: Algorithm 3.6.
Algorithm 3.6
Input: matrices X\ G Cnxp and Y\ G Cnxq with full column rank .
1. Compute the triangular part R of A-orthogonal QR factorization of L\ = [X\ Y\]-
2. Compute R12 = R(l : p:p -\-1:p -\- q) and R22 = R(p +1p -\- q:p -\- l p ~\~ q).
3. Compute the triangular part Ry of A-orthogonal QR factorization of Y.
4. Compute L^[eye(rt .
zeros(*?,p) R22/Ry
5. Compute the smallest m = min(p, q) singular values si,...,sm of L and the
corresponding m right singular vectors V. The principal angles are computed as
k = 2 arcsin (s^/a/2) for A:=1,...,m.
6. Compute the principal vectors:
lx = ^/2Xi/R{l : p,1:p)V(l p, ) and V3 = ^Y1/RyV(p-\-1:p-\-q,:).
Output: the principal angles ^1,...,and the corresponding principal vectors Vx and
^ in the A-based scalar product.
Finally, we obtain some properties of A-orthogonal projectors. We use the rep-
resentation for the A-orthogonal projector as in [48, 72]. Let Px be an A-orthogonal
projector on the subspace such that Px = XXHA^ where XHAX = I.
57
Theorem 3.4.13 Let Px (md Py be A-orthogonal projectors on the subspaces X and
y, correspondingly. Then
St,W) (A^(PX + Py)A~^) =
1 + CS K,/2)^^)) > 1-CS
In particular,
\\px + Py\\A = i + cs(eAiain(x,y)),
where y) the smallest angle between the subspaces X and y in the A-based
scalar product.
Proof: Let the columns of matrices X and Y form A-orthonormal bases for the
subspaces X and 3^, correspondingly. We have
A1/2(PX + Py)A~1/2 = A1/2(XXHA + YYhA)A~1/2
= A1/2{XXh + YYh)A1/2
= a1/2[xy][xy]ha1/2.
Let L = A1/2[xy], then Al/2{PX + Py)A~1/2 = LLH. Thus, the eigenvalues of
Al/2{PX + Py)A~1/2 are the square of the singular values of L. Notice that we could
add zeros or delete zeros from the singular values of L in order to match the eigen-
values of Al/2{PX + Py)A~1^2. Moreover, Al/2{PX + Py)A~1^2 is positive definite and
Hermitian. The singular values of A1^2(Px + Py)A~1^2 are the same as the eigenvalues.
From Theorem 3.4.3, we obtain the result.
3.5 Matlab implementation
In this section, we describe our code SUBSPACEUPDATE.m in detail. Our func-
tion has more features, compared to the built-in MATLAB function SUBSPACE.m
and the function SUBSPACEA.m in Mathworks written by Knyazev and Argen-
tati [48]. Our function returns principal angles, principal vectors, and normalized
principal coefficients. Our function specifies options. Below is a list of options which
SUBSPACEUPDATE takes:
58
opts.fast=0 Using the algorithm as described in Algorithm 3.1 or Algorithm 3.4 for the A-based scalar product.
opts.fast=l Using the algorithm as described in Algorithm 3.2 or Algorithm 3.5 for the A-based scalar product.
opts.fast=2 Using cosine-based algorithm or cosine-based algorithm for the A-based scalar product.
opts.A Use A-based scalar product, where A is positive definite.
opts.R=0 Use the orthonormal bases or use the A-orthonormal bases for input matrices.
opts.R=l Use the triangular upper matrices from the QR factorization or the A-orthonormal QR factorization of input matrices.
opts.number Find k smallest principal angles, if the number is k.
opts.disp Diagnostic information display level 0,1,2.
With opts.fast = 0, the code is written based on Algorithm 3.1 and Algorithm 3.4.
To compute the principal angles in the A-based scalar product, we add one more
option, such that opts.A = A. With opts.fast =1,the code is written based on
Algorithm 3.2 and Algorithm 3.5. With opts.fast = 2, the code is written based on
the cosine-based algorithm, which computes all principal angles, but it loses accuracy
for small angles. With opts.R=l, we use Algorithm 3.3 and Algorithm 3.6 to compute
principal angles and principal vectors with respect to the standard inner product and
the A-based scalar product, correspondingly. Moreover, we use Algorithm 3.3 to
compute the principal angles for large sparse-type data. We summarize the options
which algorithms 3.1, 3.2, 3.3, 3.4, 3.5, and 3.6 take in the following table.
Name option Name option
Alg. 3.1 opts.fast = 0 Alg. 3.2 opts.fast =1 Alg. 3.3 opts.R =1 Alg. 3.4 opts.fast = 0, opts.A Alg. 3.5 opts.fast =1,opts.A Alg. 3.6 opts.R =1,opts.A
A listing for the function ORTHUPDATE.m is provided in Appendix A.
59
3.6 Numerical tests
We perform the numerical tests for our code SUBSPACEUPDATE.m based on
algorithms 3.1, 3.2, 3.3, 3.4, 3.5, and 3.6 described in previous sections for full-
type structure matrices and Algorithm 3.3 for sparse-type structure matrices. For
comparison, we also run the code SUBSPACEA.m [48] based on the sine-cosine based
algorithm which can be obtained from MathWorks. Numerical tests are performed
on Windows Intel Duo Core and 64-bit operating system, running MATLAB R2011b.
Our first example is taken from [9, 48] with p =13 and n = 26. The matrix Xi
is orthogonal, while Y\ is an n by p Vandermonde matrix. In our code, all compu-
tations are performed in double precision in MATLAB. We observe that our results
in Table 3.9 are coinciding with those of [48] within twelve decimal digits and are
consistent with those of [9] within four decimal digits.
In the next test, we take the example from [48]. Let p = q < n/2. We define n
by p matrices
^1=[I of and Y1 = [i D }H,
where is a diagonal matrix of the size of p such that D = diag(di,...^dp) and
> 0 for fc =1,...,p. / is the identity matrix of the size p and O are zero matrices
of appropriate sizes. The exact values of sines and cosines of principal angles between
column spaces of matrices Xi and Y\ are given by
sinWfc)=
cos(9k)=
4
V1 + 4
1
V1 + 4
for k =1,...,p. Assuming that dkS are arranged in the non decreasing order, then
k^ are in the nonincreasing order. Let k be computed from our algorithms. The
60
Table 3.9: Computed sines and cosines of principal angles.
k sin(6fc) cs(0k)
1 0.9998785422971522 0.01558527040901074
2 0.9981506873301972 0.06078820101183188
3 0.6456213362708716 0.7636577048336605
4 0.4928094246236336 0.8701372713555736
5 0.3975367883303482 0.9175862367777712
6 0.3370430714820327 0.9414892288103764
7 0.2700504602152627 0.9628461709626958
8 0.2156943479780999 0.9764609302221479
9 0.1418470818351141 0.9898885823035148
10 0.1387517672025158 0.9903271919412184
11 0.06089682091193156 0.9981440663565656
12 0.05942261363977260 0.9982329151997635
13 1.072148118598322e-017 1.000000000000000
collective error in principal angles is measured as the following sum:
sin(
+ +
(3.6.1)
We multiply matrices Xi and Y\ by a random orthogonal matrix U\ of the size n on
the left to get
X2 = [/iX! and y2 = U^.
Also we multiply matrices by random orthogonal p by p matrices t/x and t/y, corre-
spondingly, on the right to obtain
X3 = X2Ux and y3 = Y2Uy.
We highlight that the principal angles and the condition numbers are invariant with
respect to these two transformations.
61
Error Growth with Problem Size
50 100 150 200 250 300 350 400 450 500
problem size: p=20
Error Growth with Problem Size
50 100 150 200 250 300 350 400 450 500
problem size: p=20
Figure 3.3: Errors in principal angles: p = 20 top and p as a function of n such
that p n/2 bottom; left for Algorithm 3.1 and Algorithm 3.4 with A I^ and right
for Algorithm sine-cosine and Algorithm sine-cosine with A l.
In the following tests, we start to check the scalability of the code of Algo-
rithms 3.1, 3.2, 3.3, and Algorithms 3.4, 3.5, 3.6 with A = /, for well-conditioned
cases. We plot the collective error in (3.6.1) by increasing the size of the problem n
for the principal angles between X3 and I3.
In our first two tests, we use the same D as in [48]. The diagonal entries of D
are chosen as uniformly distributed random numbers on the interval (0,1). On the
top part of Figure 3.3, we fix p = q = 20 and increase the size of the problem n
and plot the collective error, for the principal angles between X3 and Y3j against the
value n/2. The left side graph of Figure 3.3 presents for Algorithm 3.1 and the right
side graph presents for the sine-cosine based algorithm in [48]. We get the similar
observation as in [48] that the average error grows approximately two times with a
ten time increase in the problem size in both graphs. The behavior of Algorithm 3.1
62
and sine-cosine based algorithm is very similar. On the bottom part of Figure 3.3,
we increase p = q = n/2. We observe that the average error grows with the same
pace as the problem size in both right and left graphs. Algorithm 3.4 and sine-cosine
based algorithm with A I behave very similarly in Figure 3.3. We also try other
analogous tests which verify the similar behavior of these algorithms. We also test
Algorithms 3.2, 3.3 and Algorithms 3.5, 3.6 with A = I. We obtain similar graphs as
these displayed in Figure 3.3, so some graphs are not shown here.
Error Growth with Problem Size
Figure 3.4: Errors in the principal angles for Algorithms 3.1, 3.2, and sine-cosine
based algorithm with D = diag(10_16rand^1^) and p = 20.
In the second test, we focus on testing for small angles. We set p = g = 20 and
select every diagonal element of in the form 10_16rand where rand(is again
a uniformly distributed random number on the interval(0,1).We increase the size of
the problem n and plot the collective error, for the principal angles between X3 and
Y3. We compare the collective errors of our new Algorithms 3.1 and 3.2 with those
of the cosine-sine based algorithm in Figure 3.4. In this figure, we get good accuracy
with errors less than 6 x 10-15 for these three methods. Moreover, among these three
methods, Algorithm 3.1is the most accuracy and Algorithm 3.2 is the least accuracy.
63
In the third test, we turn our attention to the errors for individual PABS,
| sin(4) sin(4)| + | cos(4) cos(4)| for fc =1...p
instead of the sum of errors in (3.6.1). We fix a small p = q and n =100 and compute
the principal angles between X2 and Y2 and between X3 and I3 500 times. In [48],
they test several different combinations of angles less than tt/4. One of the worse-case
scenario is taking
D = diag (l,0.5,10-11,10~12,10~13,5 x 10~15, 2 x 10~15,10~15,10~16, ).
We repeat this test for our Algorithms 3.1,3.2 and the sine-cosine based algorithm,
which is presented in Figure 3.5. The top three graphs of Figure 3.5 show the dis-
tributions of the error for individual angles between X3 and I3 for Algorithm 3.1,
Algorithm 3.2 and sine-cosine based algorithm in 500 runs. In Figure 3.5, smaller an-
gles are further away from the view of reader. Again, we observe that Algorithm 3.1
achieves the best stability and Algorithm 3.2 achieves the least stability for small
angles among these three algorithms. Three graphs on the bottom of Figure 3.5
demonstrate the performance of Algorithm 3.4, Algorithm 3.5 and sine-cosine based
algorithm with A I for the same problem.
Surprisingly, the errors for small individual angles are getting smaller for Algo-
rithm 3.4 with A = I. The main reason is the following. We use Ql^ which has
A-orthonormal columns, to apply in Algorithm 3.4. Ql should return as an n by
k matrix in the exact arithmetic, but in double precision QLl returns as an n by fc
matrix, where k < k in this case. In our code, we add k k zero angles. In this test,
it returns two or three zero angles in Algorithm 3.4. Since the exact three smallest
angles are less than 4 x 10-16, it returns that the error for the three smallest angles
are less than 4 x 10-16 for Algorithm 3.4.
Also, we did the same test for Algorithms 3.3 and 3.6 (see Figure 3.6), which
do not require the orthonormal bases and A-orthonormal bases, correspondingly. In
64
Alg. 3.1
Alg. 3.2
(10'15
angle number reversed
Figure 3.5: Error distributions for individual angles: top three for Algorithm 3.1,
Algorithm 3.2, and Algorithm sine-cosine; bottom three for Algorithms 3.4, 3.5, and
sine-cosine with A I.
angle number reversed
Alg. 3.1,A=I
Alg. 3.2,A=l
angle number reversed
angle number reversed
Sine-cosine Alg,A=l
200
<1-15
angle number reversed
Sine-cosine based Alg
65
Figure 3.6: Errors in the principal angles for Algorithms 3.3 and 3.6.
Figure 3.6, we get high accuracy for both algorithms.
In the next test, we will analyze the stability of our algorithms for ill-conditioned
scalar products. We take A = Ak =10~kI + H for k =1,...,16 as in [48], where
/ is the identity matrix and H is the Hilbert matrix of the order twenty with hij =
l/(z + j 1)for z, j =1,...,20. We take Xi and Yi as G and F in [48], such that
Xx = eye(20,10) and Yi is equal to the last ten columns of the Vandermonde matrix
with elements yiyj = i20~j for z, j =1,...,20, with k(Yi) ^ 1013. We apply the LU
factorization to Yi with partial pivoting in order to reduce the condition number of
Y\. Therefore, we get the lower triangular part matrix Y2j such that TZ(Yi) = 71(Â¥2).
As k increases, the subspaces X and 3^ are invariant, but the condition number of A
increases, since f^(A) ^ 10^. We plot the sines of all ten principal angles as function of
k in logarithmic scale. Also, as in [48], we plot the error, which is defined as follows.
error = \\V^AVX I\\ + \\Vy AVy I\\ + || diag(cos(0)) V^AVy\\,
where and Vy are principal vectors of the pair of subspaces.
In Figure 3.7, we observe that the results of our new algorithms are quite similar
as those of the sine-cosine based algorithm in [48]. The error increases as the condition
number of A increases. Furthermore, among Algorithms 3.4, 3.5, and 3.6, the error
o o o o o
o o o o o
5 4 3 2 1
szJ9qiu
66
Alg. 3.4
Alg. 3.5
Alg. 3.6 Error vs. Condition Number
Figure 3.7: Sine of angles as functions of k for Algorithms 3.4, 3.5, and 3.6 (top
and left on bottom); Errors as functions of condition number for Algorithms 3.4, 3.5,
and 3.6 t^ngnt on bottom).
of Algorithm 3.5 is the smallest for large k. In particular, for fc =16 and k{A) ^ 1018
our code for Algorithm 3.4 returns zeros for 6 smallest angles, which leads to gaps
occurring in Figure 3.7 for Algorithm 3.4. The reason is the same as we have explained
in the previous test. We lose accuracy in the step for A-orthonormal. In Figure 3.7
(bottom), we observe that Algorithm 3.6 is not stable for ill-conditioned A. The
reason is that the modified Gram-Schmidt algorithm with respect to the A-based
scalar product is not stable for ill-conditioned matrix.
3.7 Discussion
In this section, we briefly discuss algorithms which we have presented in the
previous sections, and give a short summary for the comparison for these algorithms
67
and the sine-cosine based algorithm. The algorithms to compute PABS and PABS in
the A-based scalar product are summarized in Tables 3.10.
Table 3.10:_ Algorithms to compute PABS and PABS in the A-based scalar product.
Name Algorithm Name Algorithm (A-based scalar product)
Alg.3.1 X = orthupdate(Xi) Y = orthupdate(Yi) svd of [x y] Alg.3.4 X = orthupdate(Xi, opts.A) Y = orthupdate(Yi, opts.A) Li=[xy] Qlx = rthupdate(Li, opts.A) SVD of
Alg.3.2 X = orthupdate(Xi) Y = orthupdate(Yi) SVD ofXHY: UTVh SVD oiXU -YV Alg.3.5 X = orthupdate(Xi, opts.A) Y = orthupdate(Yi, opts.A) SVD of XhAY: UTVh Li=XU -YV Ql = orthupdate(Li, opts.A) SVD of Q^ALx
Alg.3.3 R = qi([X1Y1}) Ry = qr([i^12 R22Y) / E12i^l SVD of y Alg.3.6 R from A-orthogonal QR of [X\ Yi] Ry from A-orthogonal QR of Y\ / SVD of y 1
Sine-cosine X = orthupdate(Xi) Y = orthupdate(Yi) SVD of XhY SVD of (/ XXh)Y Sine-cosine X = orthupdate(Xi, opts.A) Y = orthupdate(Yi, opts.A) SVD of XhAY Li=(I -XXhA)Y Qlx = rthupdate(Li, opts.A) SVD of Q^ALx
We remark that we use the function ORTHUPDATE.m with the default value,
such that opts.fast is 0, which is is backward compatible with MATLAB built-in
function orth to compute the orthonormal bases. The function orthupdate.m with
option opts.A is backward compatible with ortha.m function in [48], which is more
expensive than A-orthogonal QR factorization using the function mgsa.m based on
68
our MATLAB tests.
Next, we summarize the comparison for algorithms to compute PABS in Ta-
ble 3.11 and PABS in the A-based scalar product in Table 3.12.
Table 3.11: Comparison for algorithms to compute PABS
Name Advantage Disadvantage
Alg.3.1 most accurate expensive for very large n
Alg.3.2 the cost is the same as the cost of less accuracy for small angles com-
sine-cosine based algorithm pare to Alg.3.1 and Alg. sine-cosine
Alg.3.3 worth for sparse matrix
Sine-cosine the cost is the same as Alg.3.2 cannot compute principal vectors in good accuracy corresponding to PABS in the neighborhood of the threshold
Table 3.12: Comparison for algorithms to compute PABS in the A-based scalar
product (1).
Name Advantage Disadvantage
Alg.3.4 if A is available as RHR or KH PABS can be obtained in most high accuracy. if the function orth lose accuracy for [XI"], then those angles return as zeros
Alg.3.5 good accuracy for an ill-conditioned matrix A
Alg.3.6 cheap lose accuracy for an ill-conditioned matrix A and the accuracy depends on MSG algorithm
Sine-cosine good accuracy for an ill-conditioned matrix A lose accuracy for principal vectors corresponding PABS near thresh- old
69
3.8 Conclusions
In this chapter, we have derived several new methods to compute all PABS and
PABS in the A-based scalar product accurately. The algorithms are based on com-
puting the sines of half angles using the algebraic and geometric approaches. We have
also described a numerically robust and fast method to compute PABS of sparse-type
and dense-type matrices using the basis of triangular matrices from QR factoriza-
tions. We conclude this work with several numerical tests. The code is robust and
can provide accurate angles not only for large scale matrices but also for large sparse
matrices.
There remains a number of problems associated with this technique, such as de-
termining orthonormal column in the A-based scalar product to make the fourth step
in Algorithm 3.5 more accurate. Though alternative ways to compute A-orthonormal
basis are available, such as a variant of the Gram-Schmidt algorithm [72] or the mod-
ified Gram-Schmidt as we use for the A-orthogonal QR factorization, those methods
do not fit well for our purpose.
70
4. Majorization inequalities
Majorization is a topic of much interest in various areas of mathematics, e.g., ma-
trix theory, numerical analysis, combinatorics, statistics, and probability. Majoriza-
tion is one of the most powerful techniques that can be used to derive inequalities in
a concise way. In particular, majorization relations among eigenvalues and singular
values of matrices produce a variety of norm inequalities in matrix theory. For a
general discussion of majorization, see Mashall, Olkin and Arnold [55], Bhatia [5],
Ando [1], and Horn and Johnson [36].
In this chapter, we present weak majorization inequalities for singular values and
eigenvalues of Hermitian matrices. We are concerned with the singular values for
the difference of two Hermitian matrices A B bounded by the singular values of
AT TB and T~lA BT~1^ where T is positive definite. The extensions of these
bounds applied to eigenvalues are provided. We also present the relations involving
the difference of eigenvalues of A and B with a matrix AT TB^ where T is invertible.
Our work is inspired by Bhatia, Kittaneh and Li [7], Bhatia, Davis, and Kittaneh [6].
Also, in this chapter we consider the absolute change in the principal angles
between subspaces X and 3^, where subspaces are perturbed to X and y. We use weak
majorization to measure the change. We also provide some basic results involving the
absolute change for the tangents of PABS and present a perturbation analysis for
sines and cosines of half-PABS.
4.1 Definition and properties of majorization
Definition 4.1.1 Weak majorization and majorization
Let and G Rn he the vectors obtained by rearranging the coordinates of vectors x
and y in algebraically nonincreasing orderdenoted by ...x\n\ and ...y\n\
such that X[i] > X[2] > X[n] and y[i] > y[2] > y[n] We say that x is weakly
majorized by yusing the notation x w yif
k k
< ^V[i\, l
71
If in addition we say that x is majorized or strongly majorized
by y7 using the notation x -< y.
For example, we have the following strong majorization:
[]t1.
Moreover, we can see that all vectors with sum majorize the vector with uniform
components, i.e.,
Geometrically, for G the relation x < y holds if and only if x is in the
convex hull of all vectors obtained by permuting the coordinates of y, i.e., see [55].
In Figure 4.1(left), for a fixed vector y we see that all vectors x on the blue segment
are majorized by y, such that [yiJ^y2 x y. In Figure 4.1(right), for
a fixed vector y we see that all vectors x on the area bounded with blue edges are
majorized by y, such that [yi+f 333] x y.
Figure 4.1: Geometry of majorization in 2D and 3D.
We remark that x < y means Xi < ^ for z =1,...,n. Therefore, x < y
implies x -
all reflective and transitive [3, 5], but not symmetric, e.g., y -< x and x -< y do
not imply x = y. By the characterization of majorization, Polya proves in [65] that
72
nondecreasing convex functions preserve the weak majorization. In other words, if
y ~
notation f(x) = [f(xi),f(x2),. , f(xn)].
The following two results provide ways to preserve majorization. These two basic
majorization inequalities of vectors can be obtained by straightforward algebra. Let
nonnegative vectors x, y, and v be decreasing and of the same size. If x -
xu -
However, the converse is not true in general. Moreover, if x -
xu -
you -
4.2 Some known majorization results
In this section, we briefly review some existing majorization inequalities based on
singular values and eigenvalues of matrices. We provide these well known majorization
results as they serve as a foundation for our new majorization inequalities for singular
values, eigenvalues, and the absolute change in PABS.
In the rest of this thesisA( ), S(.)and Q denote the ordered vectors,
i.e(.)=(.)S(.)=(.)and = .
Theorem 4.2.1 [Lidskii-Mirsky-Wielandt Theorem] [5, 55] If A and B are Hermitian
matricesthen
(A-S).
Theorem 4.2.2 [55, Fan theorem p. 287] For any square matrices A and B, we have
S(A + B) ^ S(A) + S(B).
Theorem 4.2.3 [5, Mirsky^ Theorem] For any square matrices A and 5, we have
\S(A)-S(B)\ ^ S(A-B).
73
Theorem 4.2.4 [5, p. 71,Lidskii^ Theorem] For Hermitian matrices A and B)
|(A)- (_B)| - 5).
Let the eigenvalues of A and B be in nonincreasing order, such that
A(A) = [Ai(A),..., An(A)] and A(B) = [Ai(5),..., An(5)].
There are two inequalities that follow from Theorem 4.2.4. One is called WeyPs
theorem:
max^ |A^(A) A^(5)| < ||A B\\.
The other (see Stewart and Sun [73]) is I()(_B)|2 < ||A 5||i?, which
is a special case of the HofTman-Wielandt theorem established for normal matrices.
The results mentioned above describe the eigenvalues and the singular values of
the difference or the sum of matrices. Next we turn to the singular values of the
product of matrices.
Theorem 4.2.5 [51] For general matrices A and B, we have
S(AB) -
If needed, we add zeros to match the sizes of vectors on either side.
Theorem 4.2.6 [5, p. 94, Theorem IV.2.5] Let A and B he square matrices. Then,
for any t > 0 we have
S\AB) -
where St(A) = [s^(A),..., s^(A)].
We can see that Theorem 4.2.o is a generalization of Theorem 4.2.5, if A and B
are square matrices. However, Theorem 4.2.5 also works for rectangular matrices.
4.3 Umtarily invariant norms, symmetric gauge functions, and
majorization
74
In this section, we discuss the relationship between unitarily invariant norms,
symmetric gauge functions, and weak majorization.
Definition 4.3.1 Unitarily invariant norms
A norm \ \\ \ \\ on the space of matrices is called unitarily invariantif
\\\UAV\\\ = \\\A\\\,
for all unitary matrices U and V.
For example, the 2-norm, Ky-Fan k norms, Schatten p-norms, and Frobenius
norm are all unitarily invariant norms.
Ky-Fan k norms: ||^4|U = si (^) ? where fc =1,...,min(m, n).
Schatten p-norms: || A|=(E^(m,n)^4)for p > 1.
The Frobenius norm is a special case of a Schatten p-norm for p 2.
Definition 4.3.2 Symmetric gauge functions
A function $ : ^ R called a symmetric gauge function if
^ is a norm,
$(-P = $( for any ^ G and permutation matrix P,
It is absolute, $(|CI) = ¢(0*
The relationship between unitarily invariant norms and symmetric gauge func-
tions is well investigated by von Neumann, e.g., see [35, p. 438, Theorem 7.4.24].
He shows that if $ is a symmetric gauge function, the matrix function defined by
$(S,(A)) is a unitarily invariant norm. Conversely, for every unitarily invariant norm,
there is a corresponding symmetric gauge function $ such that |||A||| = $(S,(A)).
Considering singular value decomposition, it is immediately apparent that a uni-
tarily invariant norm on a matrix A is a function of the singular values of A. Let
75
the SVD of A be UT,VH, then |||A||| = \\\UT,VH\\\ = |||S|||. The following theo-
rem provides an intimate link between unitarily invariant norm inequalities and weak
majorization inequalities.
Theorem 4.3.3 [5] Let A and B he square matrices. Then 5(A) -
only i/|||A||| < |||-B||| for every unitarily invariant norm.
From this theorem, we see that the inequalities of unitarily invariant norms are
equivalent to weak majorization of singular values of matrices. For a general discus-
sion of the connection between unitarily invariant norms, symmetric gauge functions,
and majorization, the reader is referred to [5].
Theorem 4.3.4 [5, p. 254] Let A and B he square matrices and AB he a Hermitian
matrix. Then
\\\AB\\\<\\\re(BA)\\\,
for every unitarily invariant norm, or equivalently
S(AB) ^wS(re(BA)),
where re(A) denotes (A + AH)/2.
Theorem 4.3.5 [7] Let A and B he square matrices and AB he a normal matrix.
Then
111(1^1^)111 <111(1^1^)111,
for every unitarily invariant norm, or equivalently
S (\AB\l^) ^ 5 (|5A|^),
where |A|pi=(AHA)1/2.
76
Theorem 4.3.6 [5, p. 236 VIII.3.9] Let A and B he square matrices such that A =
JD\J~l and B = TD2T~1^ where J and T are invertible matrices7 and D\ and D2
are real diagonal matrices. Then
|||diag(A(A))-diag(A(5))||| < [k{J)k{T)]1/2\\\A-B\\l
for every unitarily invariant norm, or equivalently
\A(A)-A(B)\ [k(J)k(T)]1/2S(A-B),
vjhem did^g{A{J\y)denotes a diagonal matrix tuhose diagonal eutries armugai in de-
creasing order are the eigenvalues of A, and k(.) denotes the condition number.
Theorem 4.3.7 [6] Let A and B he Hermitian and T he positive definite. Then
smin(T)\\\A-B\\\<\\\AT-TB\\\,
for every unitarily invariant norm, or equivalently^
Smin(T)S(A B) ^
where smin(T) denotes the smallest singular value ofT.
4.4 New majorization inequalities for singular values and eigenvalues
In this section, we apply the results of Sections 4.2 and 4.3 to derive weak ma-
jorization inequalities for the difference of singular values and eigenvalues of matrices.
Lemma 4.4.1 If A and B are square matrices, then
8(1?)4/2(4/2(
Proof: First, we show that S = Sl^2{T) for all T G Cnxn. Since |T|pi is
semi-positive definite and Hermitian, it follows that 1/2 (|T|pi)=S"1/2 (|T|pi). Also,
it follows that \T\^ is semi-positive definite and Hermitian. Therefore, S =
77
(|T|3) = A1/2 (|T|pol).Moreover, it is easy to check that S"(|T|pol)=S"(T). So,
we have
^(l^li?)=A1/2(|T|pol) = 51/2(T).
Setting T = AB) it follows that
S{\AB\1J^=S1/2{AB).
Applying Theorem 4.2.6 for S1^2(AB)J we obtain the result.
Lemma 4.4.2 For a square matrix A? we have
S (re(A)) 5(A).
Proof: From Theorem 4.2.2 we have
)=^^+).
Since S(A) = S(AH)J we have S (re(A)) -
The following theorem is inspired by Bhatia, Kittaneh and Li [7]. We use the
same technique as Bhatia, Kittaneh and Li used in [7, Theorem 2.2] to prove the
majorization inequality of singular values of AT TB and T~lA BT~l.
Theorem 4.4.3 Let A and B he Hermitian and T he positive definite. Then
(A TS) S (T-U ST-1).
Proof: Since A and B are Hermitian, we have \ (A 5)2|p1[A B)2. Thus,
S(A-B) = S{\(A- B)2^) =S{\(A-B) T~^T^ (A 5)|^).
Applying Theorem 4.3.5, we have
-B)
78
Using Lemma 4.4.1, it follows that
S(A -B)^WS ((T1/2 (A B)2T~l/2\1^
w /2 (T1/2 (4 B) T1/2)/2 (T-1/2 (4 B) T-1/2).
Since the nondecreasing convex functions preserve the weak majorization, e.g., x2 for
x > 0, we square both sides to get
S)w S (T1/2 (A S) T1/2) S (T_1/2 (A S) T_1/2).
Using Theorem 4.3.4,
S2(A S)w S (re [(A S) T]) S (re [T1 (A S)]) . (4.4.1)
Since BT TB and BT~l T~lB are skew-Hermitian matrices, i.e., CH ¢7, thus,
re (AT TB) = re [(A B)T + (BT TB)} = re [(A B) T] (4.4.2)
and
re {T~lA BT~l) = re \T~l (A- B)- (BT-1-T~lB)]
=re [T^iA-B)]. (4.4.3)
By Lemma 4.4.2, we obtain
5 [re (AT TB)] -
5 [re {T~lA BT~1)] -
Substituting (4.4.2) and (4.4.3) into (4.4.1), together with (4.4.4) and (4.4.5), we
obtain the result as desired.
Remark 4.4.4 In [7, Theorem 2.2] 7 the authors prove the inequality for unitarily
invariant norm such that \\\A B\\\2 < \\\AT TB\\\ |||T_1A 5T_1|||7 which is
equivalent to (see [7, Proposition 2.1])
s^AT-TB)^ (y s^T^A-
79
for fc =1,2... ,n. However^ the weak majorization inequality in our Theorem 4-4-3
implies that
Y, s'(A (s^at (T~lA BT~1))
i=i i=i
for k =1,2 ..., n.
Lemma 4.4.5 Under the assumptions of Theorem we have
S2 (A S) (T-1)(AT TS).
Proof: Since T~lA BT~l T~l (AT TB) T_1, we have
S (T-M ST-1)^ S (T-1)S (AT TS)S (T-1).
By Theorem 4.4.3, this lemma is proved.
Lemma 4.4.6 Let A and B he Hermitian and T he invertible. We have
1. sminCO|-J(-).
2. |A(A) A(B)\2 -
3. |A(A) AB)|2 w 52 (T1) 52 (AT TS).
Proof: Let the SVD of T be UHVH, where U and V are n by n unitary matrices
and L is diagonal, ^mce 1is invertible, it follows that S is positive definite.
AT-TB = AUEVh UEVhB = U (UHAUE EVHBV) VH.
Since the singular values are invariant under unitary transforms, we have
S {AT TB) = S (UhAUE EVHBV) . (4.4.6)
Taking A := UHAU, B := VHBV, and T := S in Theorem 4.3.7, we obtain
Smin(S)5([/HA[/ w . (4.4.7)
80
Applying Theorem 4.2.4, we have
|(A)B)| = |([/(V^SV| w S (. (4.4.8)
Combining (4.4.6), (4.4.7), and (4.4.8), we obtain the first statement.
Next, we prove the second statement. We have
T~lA BT~l = VY,~lUHA BVY,~lUH = F {Y,~lUHAU VHBVY,~l) UH.
Therefore, we have 5 {T~lA BT~l) = 5 {Y,~lUHAU VHBVY,~l).
By Theorem 4.4.3, we have
S2 (^ S S .
By using (4.4.8), we get the second argument. Similarly, applying Lemma 4.4.5 we
obtain the third statement.
Remark 4.4.7 It is worth noting that Theorem 4-2-4 ^ one special case of the first
result of Lemma 4.4.6. In Lemma 4.4.6if we take T as an identity matrixthe first
result of Lemma 4.4.6 becomes \A(A) A(B)\ S (A B).
4.5 Perturbation analysis for principal angles
In this section, we deal with the absolute change in the principal angles between
the subspaces X and 3^, where the subspace A is perturbed to obtain a subspace
X. We establish some perturbation bounds for PABS using majorization techniques.
First, we start with a review of inequalities for the absolute changes of angles between
vectors. For vectors x and y, if the vector x is perturbed to the new vector x, we
have (see [50, 66])
\0(x,y) 0(x,y)\ < 9{x,x).
The perturbation bound for the cosines of angles is
| cos {d{x) y)) cos {d{x) y)) \ < sin {9{x) x))
81
and the perturbation bound the sines of angles is
sin {9{x^ y)) sin (6(x^ y)) \ < sin {9{x^ x)).
For the subspaces X and 3^,let the subspace X be perturbed to the new subspace
with dim(=dim(=dim(3^) =p. There exist results for majorization
inequalities for PABS similar to the angles between vectors, e.g., see [50, 66, 82].
Theorem 4.5.1 [66] |0(^, 3^) 0(^, 3^)| ^ 0(^, ^).
We note that the angles 0(^,3^), 0^,3^), and Q(X^X) are in nonincreasing order.
The left side of the above majorization inequality is
y)-e1(x,y)\,..., \ep(x,y) ep(x,3)|).
Theorem 4.5.1 deals with the principal angles themselves. The next two theorems
deal with the sines and cosines of the principal angles.
Theorem 4.5.2 [50] cos (0(Y)) cos (0(3^)) w sin (0(1
Theorem 4.5.3 [50, 82] sin (0(Y)) sin (0())w sin (0(1
Remark 4.5.4 Generally, we have similar results for perturbation bounds for an-
gles in the A-based scalar product as those for angles in the standard product.
By Lemma 3.4.7, we know that the principal angles between the subspaces X and
y relative to the A-based scalar product are equal to the principal angles between
KX and Ky in the standard productwhere A = KHK. By Theorem 4.5.1we
have \S(KX,Ky) S(KX,Ky)\ -
a(y)\ a(.Thereforewe conclude that if the perturbation bounds for
the absolute change of principal angles in the standard product hold, then we have
similar perturbation bounds for the absolute change of principal angles in the A-based
scalar product. For tms reason^ in the following we only deal with the perturbation
bounds for the principal angles in the standard product.
82
What happens with the tangents of PABS, if one subspace is perturbed? To see
what to expect, we derive bounds for vectors first. Let 6(x^y) < tt/2 and 6(x^y) <
tt/2. By direct calculation,
tan (6(x^ y)) tan (6(x^ y))=
sin(6l(x, y)) cos(6l(x, y)) cos(6l(x, y) sin(6l(x, y))
cos(6l(x, y)) cos(6l(x, y))
sin(d(x,y) 9(x,y))
cos(6l(x, y)) cos(6l(x, y))
Moreover,
\sm(d(x,y) d(x,y))\=sm(\d(x,y) 9(x,y)\)
< sin {9{x) x)).
Notice that {x^y) {x^y) G (tt/2,tt/2).
Thus,
|tan (d(x,y)) tan (6(x, y))| <
sin {9{x) x))
cos(6l(x, y)) cos(6l(x, y))
In particular, assuming that 6(x^y) < tt/4 and 6(x^y) < tt/4, we have
cos (0(x, y) + 9(x, y)) + cos (0(x, y) 9(x, y))
(4.5.1)
cos(6l(x, y)) cos(6l(x, y))
Since cosine is decreasing in [0,tt/2), we obtain
2
{d{x) y)) tan {9{x) y))\ < 2tan {9{x) x)).
(4.5.2)
In the following Theorem, we replace subspaces spanned by the vectors x, y, and
x with multidimensional subspaces X^ 3^, and X^ correspondingly. We extend the
inequalities of (4.5.1) and (4.5.2) to the general case.
Theorem 4.5.5 If dim(X) = dim(^) = dim(3^)7 0(^,3^) < tt/2 and 0(^,3^) <
tt/27 then
sin (0(
tan (0(^, 3^)) tan ^0(^, 3^) j
(0max(^,^))CS (0max(^, 3^))'
COS
83
In addition, if 0(^, 3^) < tt/4 and 0(^, 3^) < tt/47 we have
tan (0(^, 3^)) tan ( 0(^, 3^)
2 sin (0(
cos
(<9max (
Proof: Considering the properties of the tangents of PABS, e.g., see Table 2.2, we
have tan (0(^, 3^)) = [S+ (^Py (PyPx)^ and similarly
tan (e(X, y)) = [5+ (p^ (PyP^)
The difference between tan (0(1Y)) and tan ((Y)) can be written as
S
{PyPx)]
(P;f)t) except for some zero singular values.
S
{PyPx)]
S
{PyPx)]) S(Py. {PyPX)] ~ Py. (PyP^)
S(Py.)S((PyP^-(PyP^).
The first majorization inequality is based on Theorem 4.2.3. From Theorem 4.2.5,
we obtain the second majorization inequality. The difference between (PyP^Y and
(PyP^y can be treated as a perturbation for oblique projections [79] or as a pertur-
bation of pseudoinverse [70]. We use the technique of Wedin [79]. We have
(PyPXY {PyP^ = {PyPX)\l ~ (PyP^) ~ (I ~ (PyP^PyP^.
We Know that () is an oblique projector onto along and j () is
an oblique projector onto along Thus,
{PyPx)] = {PyPx)]Py and I (PyPxy = (/ {PyPx)]){I Px).
Similarly, we have
(PyP^Y = P^PyP^ and (/ (PyP^Y) = (/ Py)(I ~ (PyP^).
Therefore,
S {(PyPxy ~ (PyP^Y) = S (~(I (PyPXY)(I PX)P^PyP^)
^WS(I- (PyP^) S {PX^) S ((PyP^)
^
84
The first majorization inequality follows from Theorem 4.2.5. Thus,
tan (0(^, 3^)) tan ^0(^, 3^) j
sin (0(
cos (0max(^,^))cs (dmax(x,y)y
Furthermore, let 0(^, 3^) < tt/4 and 0^, 3^) < tt/4, by direct computation, we have
COS (9max (X, 3^)) cos (^dmax(x, y)^j > cos (^dmax(x, X)^j . m
In Section 3, we propose several algorithms to compute principal angles based on
the sine of half-PABS. Next we present perturbation bounds for the sine and cosine
of half-PABS, where subspaces are perturbed.
Proposition 4.5.6
cos2 ( ) 3^) ) cos2 ( ) 3^)
sin2 -0(^, 3^) - sin2 -0(^, 3^)
sin ( X)
~
Proof: Since cos2(6,/2)=(1 + cos(6,))/2 and sin2(6,/2)=(1cos(6l))/2, by Theo-
rem 4.5.2 it is easy to obtain the following two results,
and
cos2 -0(^, 3^) - cos2 -0(^, 3^)
sin2 -0(^, 3^) - sin2 -0(^, 3^)
- sin ( 0(^,
w ; sin (0(
Moreover, sin (0(,))=2 sin (0(,)/2) cos (0(,)/2) .
Results in Proposition 4.5.6 are weak majorization estimates for the square of
sines or cosines of half principal angles. Next, we deal with the perturbation bound for
sines or cosines of half-angles. Before we proceed, we present the following theorem,
e.g., [70], from which two new results are derived.
Theorem 4.5.7 [70] Lei X,Y ^ Cnxp with XHX = / and YHY = L If 2p < n,
85
there are unitary matrices Q, U and V such that
] V r
p i
QXU = and QYV = p Q
n p 0 n 2p 0
where T = diag(7i,...,7P) and Q = diag(oi,...^ujp) satisfy
< 7i << % !>...> > 0;
7 + =1 k /p.
(4.5.3)
Lemma 4.5.8 Let cri > ...> (Jp and (J\> ... > ap he sines of half of the principal
angles between subspaces and correspondingly. Thenfor k =..p
Wk Cfk\ < sin
-x(
(4.5.4)
Proof: Without loss of generality, we assume that the dimension of the subspaces
X^y and X is less than n/2, otherwise we could deal with the complementary sub-
spaces oi and A. Let the columns of X and Y be orthonormal bases for the
subspaces X and 3^, correspondingly. Applying Theorem 4.5.7, we have
s(xu-YV) = s\ [/_r -n ol
where U^V^Y and Q are as described in Theorem 4.5.7 and T = 8(0(^, 3^)). Fur-
thermore, by Theorem 3.2.1 S(XU YV) = 2sin(0(y)/2). In a similar way,
we obtain 2sin(0(Y)/2)
=5
-f
where f = cos(0(Y)).
86
Therefore
2 sin -0(^, 3^) - 2 sin -0(^, 3^)
/ I-T \ ( i-t \
5 -Q S -Q
V 0 / V 0 /
^7/? S
r-r
Ci-n
s\ \ (t-r)2 +(n-^2
We note that (f T) + (Cl fl) is a diagonal matrix with diagonal elements
2 cos (^9k(X^ y) k(X^ 3^) j for fc =1,...,p. Hence,
-
2 sin y)j 2 sin 3^)
According to Theorem 4.5.1,
Therefore,
2 sin Q(^, y)^j 2 sin Q(^, 3^)
which completes the proof.
0(^,3^)-0(^,3^)
-
Lemma 4.5.9 Let fii > ... > fip and jli > ... > ftp be cosines of half of the principal
angles between subspaces Xy and A!correspondingly. Thenfor k =.p7
\^k fik\< sin (^9max(X, X)^j . (4.5.5)
Proof: In the notation of the proof of Lemma 4.5.8, we have
S(XU + YV) = 2 cos (0(^, 3^)/2).
The rest of proor is similar to the proof in Lemma 4.5.8.
Remark 4.5.10 In [82]7 Zhang and Li developed majorization inequalities for con-
cave functions of principal angles. They state that if f : [0,tt/2] ^ [,) is a
nondecreasing concave function, then
y)) - y)) ^
Since sin(x/2) is a nondecreasing concave function on [0,tt/2]7 we have
sin -0(^,3^) -sin -0(^,3^)
^
Although the result we obtain in Lemma 4.5.8 is not as strong as the above weak
majorization inequality of sines of half-PABS, we use a different technique to prove
(4.5.4). Furthermore^ we use the same technique to prove the bound for the change of
cosines of half-PA BS in Lemma 4.5.9.
Proposition 4.5.11 Let
a = cos (0(X, y)/2) cos y)/2)j
b = sin (e(X, y)/2) sin (e(x, y)/2^j .
Then
[|a||6|]t
>/2 sin ( -Q{X) X) ) 0,
Proof: Let columns of matrices X, X and Y be orthonormal bases for the subspaces
X and 3^, correspondingly. Let the SVD of XHX be UTjVh. By Theorem 3.1.1
we have
and
v/2
cos ( -Q{X,y) ) ,sin ( -Q{X,y)
S[[xu Y]
V2
cos ( -0(X,y) ) ,sin ( -(x,y)
S[[xv Y]
88
We may need to add zeros to the vectors on the right side. Moreover, we have
5
Y])-S[[XV Y]
S
-XV o]
[S(XU-XV),
[S(XUVH X),
2sin ( 0(f))0..
From Theorem 4.2.3, we obtain weak majorization for the first inequality. The last
equality is obtained by Theorem 3.2.1. Let a = cos (0(Y)/2) cos (0(Y)/2)
and 6 = sin(0(Y)/2) sin (0(Y)/2) Then we have
v^[K |6|]
2 sin -0(X,X) ,0,
which completes the proof.
Remark 4.5.12 Let a = and b
majorization, we have
+z w z
sm
where
..bp]. From the definition of
> l
P
a, = cos (9t(X, y)/2) cos [9t(X, y)/2^
bt = sin y)/2) sin (%{X, 3^)/2)
Moreover^ it is worth noting that the bound for the sum of the absolute changes of the
sines of half-PABS and the absolute changes of cosines of half-PABS in Proposition
4-5.11 are sharp? since the constant on the right side of Proposition ^.5.11 is \^2 and
it will be at least 2 by combining Lemmas 4.5.8 and 4.5.9.
All results for the perturbation bounds for changes of angles presented above use
the condition that one of subspaces is perturbed. Now we assume both subspaces X
and 3^ are perturbed to subspaces X and 3^, correspondingly. We obtain the following
lemma which can be easily obtained from Theorems 4.5.1, 4.5.2, and 4.5.3.
89
Corollary 4.5.13 If dim(X) = dim(^) = dim(3^) = dim(3^) we have
|e(- (| ^ e(e) + (
.cos (0(,y)) cos (0(,5)w sin (0(,
sin (0(,y)) sin (0(,))w sin (0(,
+ sin (0(3^,^)).
+ sin (0(3^,^)).
Proof: We prove the first statement,
\e(x,y)-e(x,y)\ = \e(x,y)-e(x,y) + e(x,y)-e(x,y)\
< \e(x,y) e(x,y)\ + \e(x,y) e(x,y)\
-
The proof for the other two is similar.
Similarly, we bound the squares or smes and cosines of half-PABS as follows.
Corollary 4.5.14
cos2 ( -0(y) ) cos2 ( -0(y)
sin2 -0(^,3^) -sin2 -0(^,3^)
-
Theorem 4.5.15 Letai > .. crp anddi > ... > (7P he sines of half of the principal
angles between subspaces X and Xycorrespondingly. Thenfor k =1p
\(7k 5-fcl < sin X) ) + sin -6,max(3^, .y)).
(4.5.6)
Proof: First, by Lemma 4.5.8, for k 1,...we have
Wk -fc| < sin -9max(X, X)
Second, we apply a similar statement to the sine of half-PABS between subspaces
X^y and correspondingly. We have
Using the triangle inequality, we obtain the result as desired.
Similarly, we can establish an estimate of the cosine of half-PABS with respect
to perturbations of two subspaces.
Theorem 4.5.16 Let > ... > and fli jlp be cosines of half of the
principal angles between subspaces and correspondingly. Thenfor k =
1...
\f^k t^k\ ^ sin ^-6,max(A,, + sin y)^j * (4.5.7)
Let us emphasize that we can obtain similar perturbation bounds for sines or
cosines of half-PABS with respect to the A-based scalar product as in Lemmas 4.5.8
and 4.5.9, and Theorems 4.5.15 and 4.5.16.
Let us now turn our attention to a perturbation analysis in terms of matrices Xi
and Xi + AXi that generate subspaces X and X. Often a subspace is defined by
the column space of a matrix. We are interested in estimating the sensitivity of a
column space of a matrix. There are several results provided by Sun [74], and Golub
and Zha [29] involving the perturbation analysis for the principal angles of a pair of
matrices. The bounds in [74] are improved in [29] without restricting the dimensions
of two subspaces to be equal. Moreover, Knyazev and Argentati [48] generalize results
of [29, 74] to the A-based scalar product.
Lemma 4.5.17 [29, 48, 74] Let X and X be column-spaces of matrices Xx and
Xi + AX1? correspondingly. Let ka(Xi) = smax(A1/2X1)/smin(A1/2X1) denote the
corresponding A-based condition number of Xlt Then
s\n(eA_(x,x))< ^(^i)llpf/^||11-
In particular, for A = I, we have
91