Citation
Preconditioned iterative methods for linear systems, eigenvalue and singular value problems

Material Information

Title:
Preconditioned iterative methods for linear systems, eigenvalue and singular value problems
Creator:
Vecharynski, Eugene
Publication Date:
Language:
English
Physical Description:
xiii, 149 leaves : ; 28 cm.

Subjects

Subjects / Keywords:
Iterative methods (Mathematics) ( lcsh )
Linear systems -- Mathematical models ( lcsh )
Eigenvalues ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Colorado Denver, 2011.
Bibliography:
Includes bibliographical references (leaves 143-149).
General Note:
Department of Mathematical and Statistical Sciences
Statement of Responsibility:
by Eugene Vecharynski.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
747540792 ( OCLC )
ocn747540792

Full Text
PRECONDITIONED ITERATIVE METHODS FOR LINEAR SYSTEMS,
EIGENVALUE AND SINGULAR VALUE PROBLEMS
by
Eugene Vecharynski
M.S., Belarus State University, 2006
A thesis submitted to the
University of Colorado Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics


2011 by Eugene Vecharynski
All rights reserved.


This thesis for the Doctor of Philosophy
degree by
Eugene Vecharynski
has been approved
by
/i- Izz Iloio
Date


Vecharynski, Eugene (Ph.D., Applied Mathematics)
Preconditioned Iterative Methods for Linear Systems, Eigenvalue and Singular
Value Problems
Thesis directed by Professor Andrew Knyazev
ABSTRACT
In the present dissertation we consider three crucial problems of numerical
linear algebra: solution of a linear system, an eigenvalue, and a singular value
problem. We focus on the solution methods which are iterative by their nature,
matrix-free, preconditioned and require a fixed amount of computational work
per iteration. In particular, this manuscript aims to contribute to the areas of
research related to the convergence theory of the restarted Krylov subspace min-
imal residual methods, preconditioning for symmetric indefinite linear systems,
approximation of interior eigenpairs of symmetric operators, and preconditioned
singular value computations.
We first consider solving non-Hermitian linear systems with the restarted
generalized minimal residual method (GMRES). We prove that the cycle-
convergence of the method applied to a system of linear equations with a normal
(preconditioned) coefficient matrix is sublinear. In the general case, however,
it is shown that any admissible cycle-convergence behavior is possible for the
restarted GMRES at a number of initial cycles, moreover the spectrum of the
coefficient matrix alone does not determine this cycle-convergence.


Next we shift our attention to iterative methods for solving symmetric indefi-
nite systems of linear equations with symmetric positive definite preconditioners.
We describe a hierarchy of such methods, from a stationary iteration to the op-
timal Krylov subspace preconditioned minimal residual method, and suggest a
preconditioning strategy based on an approximation of the inverse of the abso-
lute value of the coefficient matrix (absolute value preconditioners). We present
an example of a simple (geometric) multigrid absolute value preconditioner for
the symmetric model problem of the discretized real Helmholtz (shifted Lapla-
cian) equation in two spatial dimensions with a relatively low wavenumber.
We extend the ideas underlying the methods for solving symmetric indefinite
linear systems to the problem of computing an interior eigenpair of a symmet-
ric operator. We present a method that we call the Preconditioned Locally
Minimal Residual method (PLMR), which represents a technique for finding
an eigenpair corresponding to the smallest, in the absolute value, eigenvalue of
a (generalized) symmetric matrix pencil. The method is based on the idea of
the refined extraction procedure, performed in the preconditioner-based inner
product over four-dimensional trial subspaces, and relies on the choice of the
(symmetric positive definite) absolute value preconditioner.
Finally, we consider the problem of finding a singular triplet of a matrix. We
suggest a preconditioned iterative method called PLMR-SVD for computing a
singular triplet corresponding to the smallest singular value, and introduce pre-
conditioning for the problem. At each iteration, the method extracts approxima-
tions for the right and left singular vectors from two separate four-dimensional
trial subspaces by solving small quadratically constrained quadratic programs.


We illustrate the performance of the method on the example of the model prob-
lem of finding the singular triplet corresponding to the smallest singular value
of a gradient operator discretized over a two-dimensional domain. We construct
a simple multigrid preconditioner for this problem.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
Andrew Knyazev


DEDICATION
To my family and friends.


ACKNOWLEDGMENT
I am deeply grateful to my advisor, Professor Andrew Knyazev, for introducing
me into the field. His vision of many problems and insights into their solution
have definitely influenced this work. Without his guidance and support the
present dissertation would have been impossible. I would like to direct my
deepest thank to Dr. Julien Langou. His advice and opinion, as of a colleague
and as of a friend, have always been important and timely. Chapter 2 of this
dissertation is based on the research that I have performed under his supervision.
It has partially been written during the three months support kindly provided
by Julien in Summer 2008. I am also indebted to Professor Jan Mandel for
introduction into the basics of multilevel methods. Several learned ideas have
echoed in this manuscript. I am grateful to Dr. Merico Argentati and Professor
Michele Benzi for reading this thesis and agreeing to be on my PhD committee.
I am thankful to my family and friends. Their care and support have been
inspiring during all stages of work on this dissertation.
Finally, I would like to thank the faculty and my fellow students at the
University of Colorado Denver for creating an excellent working atmosphere.
I am also grateful to Mr. and Mrs. Warren Bateman and the Department of
Mathematical and Statistical Sciences for the financial support.


CONTENTS
Figures .................................................................. xi
Tables.................................................................. xiii
Chapter
1. Introduction............................................................ 1
2. Convergence of the restarted GMRES...................................... 6
2.1 The sublinear cycle-convergence of GMRES(m) for normal matrices 10
2.1.1 Krylov matrix, its pseudoinverse, and spectral factorization ... 11
2.1.2 The sublinear cycle-convergence of GMRES(m)...................... 13
2.2 Any admissible cycle-convergence behavior is possible for the restarted
GMRES at its initial cycles......................................... 24
2.2.1 Outline of the proof of Theorem 2.11............................. 27
2.2.2 Proof of Theorem 2.11 for the case of a strictly decreasing cycle-
convergence ....................................................... 28
2.2.3 Extension to the case of stagnation................................ 37
2.2.4 Difference with the work of Greenbaum, Ptak, and Strakos [34] . 38
2.2.5 Generating examples with nonzero rq+1.............................. 39
2.2.6 Any admissible convergence behavior is possible for full and restarted
GMRES (at its q initial cycles).................................. 43
2.2.7 Restarted GMRES with variable restart parameter.................... 45
2.3 Conclusions.......................................................... 45
viii


3. Solution of symmetric indefinite systems with symmetric positive defi-
nite preconditioners................................................... 47
3.1 Iterative methods for symmetric indefinite systems with SPD pre-
conditioners ........................................................ 50
3.1.1 Stationary iteration for solving symmetric indefinite systems with
SPD preconditioners................................................. 53
3.1.2 Simple residual-minimizing methods for solving symmetric indefi-
nite systems with SPD preconditioners..................................... 58
3.1.3 The second-order and minimal residual methods for solving indef-
inite systems with SPD preconditioners.................................... 61
3.2 Absolute value preconditioners for symmetric indefinite systems . . 65
3.2.1 Optimal SPD preconditioners for symmetric indefinite systems . 65
3.2.2 An absolute value preconditioner for a model problem .............. 69
3.2.2.1 Multigrid absolute value preconditioner........................... 71
3.2.2.2 Numerical examples................................................ 77
3.3 Conclusions...................................................... 81
4. Preconditioned computations of interior eigenpairs of symmetric oper-
ators ................................................................. 84
4.1 Idealized preconditioned methods for finding an interior eigenpair . 87
4.2 The Preconditioned Locally Minimal Residual method for computing
interior eigenpairs.............................................. 95
4.2.1 PLMR: The choice of trial subspaces............................. 96
4.2.2 PLMR: The choice of iteration parameters........................ 98
4.3 Numerical examples .................................................. 103
ix


4.4 Conclusions.......................................................... 108
5. Preconditioned singular value computations .................... 110
5.1 Idealized preconditioned methods for finding a singular triplet . . 117
5.2 The Preconditioned Locally Minimal Residual method for computing
the smallest singular triplet....................................... 123
5.2.1 PLMR-SVD: The choice of trial subspaces...................... 124
5.2.2 PLMR-SVD: The choice of iteration parameters................. 127
5.3 Numerical example.................................................... 133
5.4 Conclusions.......................................................... 141
References......................................................... 143
x


FIGURES
Figure
2.1 Cycle-convergence of GMRES(5) applied to a 100-by-100 normal ma-
trix................................................................ 18
2.2 Cycle-convergence of GMRES(5) applied to a 100-by-100 diagonal-
izable (nonnormal) matrix........................................ 22
3.1 Comparison of the MG absolute value and the inverted Laplacian
preconditioners for PMINRES applied to the model problem of the
size n = (27 l)2 1.6 x 104................................... 78
3.2 Performance of the MG absolute value preconditioners for the model
problem with different shift values. The problem size n = (27 l)2 rs
1.6 x 104. The number of negative eigenvalues varies from 0 to 75. . 80
3.3 Comparison of PMINRES with locally optimal methods (3.17),
(3.19) and (3.21), (3.24), all with the MG absolute value precon-
ditioners, applied to the model problem of the size n = (27 l)2
1.6 x 104........................................................ 81
xi


4.1 Comparison of the PLMR method with the MG absolute value pre-
conditioner versus the idealized eigenvalue solvers, applied to the
model eigenproblem of the size n = (27 l)2 1.6 x 104. The tar-
geted eigenpairs correspond to the smallest magnitude eigenvalues of
the shifted discrete negative Laplacian (from top left to bottom left,
clockwise): A13 ss 6.33 x 10-4, Ai3 2.7426, A15 3.4268 and
A17 ss 7.19 x 104, given by shift values c2 = 197.258, 200, 250 and
256.299, respectively.............................................. 105
4.2 Comparison of the PLMR method with and without orthogonaliza-
tion on the trial subspaces. Both versions of the method are applied
to the model eigenproblem of the size n = (27 l)2 1.6 x 104 and
use the MG absolute value preconditioner. The targeted eigenpairs
correspond to the smallest magnitude eigenvalues of the shifted dis-
crete negative Laplacian: Ai3 2.7426 (left) and Ai5 3.4268
(right), given by shift values c2 = 200 and 250, respectively......107
5.1 Comparison of the PLMR-SVD method with one MG preconditioner
versus the idealized singular value solvers, applied to find the smallest
singular triplet of the m-by-n discrete gradient operator, n (27
l)2 1.6 x 104, m 2n................................................... 139


TABLES
Table
3.1 Mesh-independent convergence of PMINRES with the MG absolute
value preconditioner........................................... 79
xiii


1. Introduction
Complex numerical simulations and solutions of mathematical problems on
large-scale data sets have become the routine tasks in cutting edge research
and industry, resulting in a broad variety of computational algorithms. The
nature of the algorithms can be very diverse, however, often their efficiency
and robustness rely, ultimately, on the underlying techniques for solving basic
problems of numerical linear algebra.
In this work, we consider numerical solution of linear systems, eigenvalue
problems, and singular value problems; see, e.g., [42]. We assume that the
problems are of an extremely large size, and possibly sparse, i.e., the involved
coefficient matrices contain a significant number of zero entries. The exact so-
lutions of such problems are rarely needed. Instead, it is desirable to construct
computationally inexpensive approximations to the exact solutions. In this con-
text, the use of iterative methods, see, e.g., [3, 33, 59, 56], may be the only
option. The study of theoretical and practical aspects of several iterative meth-
ods, as well as the introduction of novel iterative techniques for solving the above
mentioned problems constitutes the core of the present dissertation.
The methods that we consider in this work share a number of common
characteristics. First, their mathematical formulations are based on short-term
recurrent relations, which allows constructing solvers with a fixed amount of
computational work and storage per iteration. Second, the methods are precon-
ditioned, i.e., they can use auxiliary operators, called preconditioners, which, if
1


properly defined, significantly improve the convergence, and, ideally, only mod-
estly affect the cost of each iteration.
In the current manuscript, we address a set of computationally challenging
problems, such as numerical solution of symmetric indefinite and nonsymmetric
linear systems, computation of interior eigenpairs of symmetric matrix pencils,
and finding the smallest singular triplets of general matrices. Our main re-
sults concern the convergence theory of the restarted Krylov subspace minimal
residual methods, novel preconditioning strategies for symmetric indefinite lin-
ear systems and eigenvalue problems, as well as the extension of the concept of
preconditioning to singular value problems.
In Chapter 2, we consider the restarted generalized minimal residual method
(GMRES) for non-Hermitian linear systems. We prove that the cycle-convergence
of the method applied to a system of linear equations with a normal (precondi-
tioned) coefficient matrix is sublinear. In the general case, however, it is shown
that any admissible cycle-convergence behavior is possible for the restarted GM-
RES at a number of initial cycles, moreover the spectrum of the coefficient matrix
alone does not determine this cycle-convergence. The results of this chapter are
mostly published in [77, 76].
In Chapters 3, 4, and 5, we consider iterative solution of symmetric indefinite
systems, symmetric eigenvalue, and singular value problems, respectively. The
material is presented in such a way that we can emphasize the interconnections
between the problems, which allows us to treat their numerical solution within
a unified approach. The obtained results, presented in the chapters, appear here
for the first time. We note that the choice of the real vector spaces has been
2


motivated merely by the desire to simplify the presentation. The extension to
the complex case is straightforward.
In Chapter 3, first, we describe a hierarchy of methods for solving symmetric
indefinite linear systems with symmetric positive definite (SPD) precondition-
ers. These methods are, mainly, based on the known idea of the minimization
of the residual in the preconditioner-based norm. The careful study of such
methods is motivated by a search of appropriate iterative schemes, which can
be extended to the problems of finding interior eigenpairs of symmetric oper-
ators, as well as computing the smallest singular triplets of general matrices.
For example, we describe a method, which can be viewed as a natural analogue
of the preconditioned steepest descent algorithm for solving SPD systems, and
is the simplest proved to be convergent residual-minimizing method for solving
symmetric indefinite systems with an SPD preconditioner. We use the locally
optimal accelerations of this method to construct the base scheme, which is
further extended to eigenvalue and singular value problems.
Second, in Chapter 3, we suggest a novel preconditioning strategy, which is
based on the idea of approximating the inverse of the absolute value of the coef-
ficient matrix. We call preconditioners, which are obtained using this strategy,
the absolute value preconditioners. We show, for a model problem of the dis-
cretized real Helmholtz (shifted Laplacian) equation in two spatial dimensions
with a relatively low wavenumber, that such preconditioners can be efficiently
constructed, e.g., in the multigrid framework. It is significant that the same
preconditioners can be used for finding interior eigenpairs of symmetric matrix
pencils, if applied within the scheme described in Chapter 4. The absolute value
3


preconditioners for symmetric indefinite systems also motivate the definition of
preconditioners for the singular value problems in Chapter 5.
Using the results of Chapter 3, in Chapter 4, we present a new method, that
we call the Preconditioned Locally Minimal Residual method (PLMR), which
represents a technique for finding an eigenpair corresponding to the smallest, in
the absolute value, eigenvalue of a (generalized) symmetric matrix pencil. The
method is based on the idea of the refined extraction procedure, also called the
refined projection procedure, performed in the preconditioner-based inner prod-
uct over four-dimensional trial subspaces, and relies on the choice of the (SPD)
absolute value preconditioner. We applied the described technique to the model
problem of finding an eigenpair of the two-dimensional discretized Laplace oper-
ator, which corresponds to the eigenvalue, closest to a given shift. The method
demonstrated a satisfactory convergence behavior, with the convergence rate
comparable, at a number of initial steps, to that of the optimal preconditioned
minimal residual method, applied to the problem of finding the corresponding
null space (eigenspace) of the shifted Laplacian.
Finally, in Chapter 5, we consider the problem of finding a singular triplet
of a matrix. We suggest a new preconditioned iterative method, that we refer to
as PLMR-SVD, for computing a singular triplet corresponding to the smallest
singular value. The method has several important features. First, at every
step, it uses a pair of separate four-dimensional trial subspaces for extracting
the right and left singular vectors, respectively. Second, it admits two SPD
preconditioners, designed specifically for a singular value problem. We show
that even the proper choice of only one of the two preconditioners can result in
4


a significantly improved convergence behavior. As a model problem we consider
computing a singular triplet, corresponding to the smallest singular value, of a
discrete two-dimensional gradient operator. We present a simple construction
of the multigrid preconditioner for this problem.
Let us summarize the main results obtained within the scope of the present
dissertation: we have proved two theoretical results which concern the con-
vergence theory of the restarted GMRES algorithm, introduced a new precon-
ditioning strategy for symmetric indefinite linear systems, suggested a novel
preconditioned method for computing interior eigenpairs of symmetric matrix
pencils, and described a preconditioned method for finding the smallest singular
triplets.
This work has been partially supported by the NSF-DMS 0612751.
5


2. Convergence of the restarted GMRES
**
The generalized minimal residual method (GMRES) was originally intro-
duced by Saad and Schultz [61] in 1986. and has become a popular method for
solving non-Hermitian systems of linear equations,
Ax = b, A e Cnxn, be Cn. (2.1)
Without loss of generality, to simplify the presentation below, we assume that
system (2.1) is already preconditioned.
GMRES is classified as a Krylov subspace (projection) iterative method.
At every new iteration i, GMRES constructs an approximation x^ +
JCt (A, A0)) to the exact solution of (2.1) such that the 2-norm of the corre-
sponding residual vector rW = b Ax^ is minimized over the affine space
r() _|_ Afc. (A, r^), i.e.,
r^ min ||r^ it||, (2.2)
u£AICi(A,r(0))
where /Q (A, r^) is the z-dimensional Krylov subspace
K-i (A, r^) = span{r^\ Ar^\ ..., Al_V0'}
induced by the matrix A and the initial residual vector A) = b Ax^ with x^
being an initial approximate solution of (2.1),
A/Ci (A,r(0)) = span{Ar(0), AV(0),..., AV(0)}.
6


As usual, in a linear setting, a notion of minimality is associated with some
orthogonality condition. In our case, minimization (2.2) is equivalent to forc-
ing the new residual vector to be orthogonal to the subspace AKi (A, r^)
(also known as the Krylov residual subspace). In practice, for a large problem
size, the latter orthogonality condition results in a costly procedure of orthog-
onalization against the expanding Krylov residual subspace. Orthogonalization
together with storage requirement makes the GMRES method complexity and
storage prohibitive for practical application. A straightforward treatment for
this complication is the so-called restarted GMRES [61].
The restarted GMRES, or GMRES(m), is based on restarting GMRES after
every m iterations. At each restart, we use the latest approximate solution as
the initial approximation for the next GMRES run. Within this framework a
single run of m GMRES iterations is called a GMRES(m) cycle, and m is called
the restart parameter. Consequently, restarted GMRES can be regarded as a
sequence of GMRES(m) cycles. When the convergence happens without any
restart occurring, the algorithm is known as the full GMRES.
In the context of restarted GMRES, our interest will shift towards the resid-
ual vectors r^ at the end of every kth GMRES(m) cycle (as opposed to the
residual vectors (2.2) obtained at each iteration of the algorithm).
Definition 2.1 (cycle-convergence) We define the cycle-convergence of the
restarted GMRES(m) to be the convergence of the residual nouns i|r^fc^||, where,
for each k, Ak> is the residual at the end of the kth GMRES(m) cycle.
GMRES(m) constructs approximations x^ G xSk~^ + /Cm (A, r(fe~d) to the
exact solution of (2.1) such that each residual vector Akl = b Ax^ satisfies
>-r
i


the local minimality condition
= min ||r^fc ^ w||, (2.3)
ueAICm[A,r<-k~x))
where Km (A, r^-1)) is the ra-dimensional Krylov subspace produced at the A*th
GMRES(m) cycle,
Km (A, r{k~l)) = span{r^fc-1\ AAk~l\ ..., A^r^}, (2.4)
AKm (A,r(fc-1)) = span{AAfc-1), A2Ak~l\ ..., J4mr(fc~1)} js the corresponding
Krylov residual subspace.
The price paid for the reduction of the computational work in GMRES(m)
is the loss of global optimality in (2.2). Although (2.3) implies a monotonic de-
crease of the norms of the residual vectors r^k\ GMRES(m) can stagnate [61, 80].
This is in contrast with the full GMRES which is guaranteed to converge to the
exact solution of (2.1) in n steps (assuming exact arithmetic and nonsingular A).
In practice, however, if n is too large, proper choices of a preconditioner and a
restart parameter, e.g., [25, 26, 46], can significantly accelerate the convergence
of GMRES(m), thus making the method attractive for applications.
While a great deal of effort has been devoted to the characterization of the
convergence of the full GMRES, e.g., [74, 21, 34, 35, 43, 70, 72], our understand-
ing of the behavior of GMRES(m) is far from complete, leaving us with more
questions than answers, e.g., [25].
For a few classes of matrices, convergence estimates are available for the
restarted GMRES and/or the full GMRES. For example, for real positive definite
matrices (that is, for matrices A for which H = (A+AT)/2 is symmetric positive
8


definite, or, equivalently, for matrices A for which xTAx > 0 for any nonzero
x G Rn), the Elmans bound [22, 23, 33, 61] can be stated as follows
||r||2 < (i p)fc||r{0)||2 where 0 < p = (Xmin{H)/\\A\\f < 1.
The latter guarantees the linear cycle-convergence of GMRES (m) for a positive
definite matrix. Improvements and generalizations of this bound can be found
in [8, 63, 82],
For normal matrices the convergence of the full GMRES is well studied. In
particular, the convergence is known to be governed solely by the spectrum of
A [62, 74], In Section 2.1 of this manuscript, we prove that the cycle-convergence
of restarted GMRES for normal matrices is sublinear. This statement means
that, for normal matrices, the reduction in the norm of the residual vector at the
current GMRES(m) cycle cannot be better than the reduction at the previous
cycle. We would like to mention the simultaneous but independent work [5],
where the authors present an alternative proof of the sublinear convergence of
the restarted GMRES for normal matrices.
Assuming that the coefficient matrix A is diagonalizable, some character-
izations of the convergence of the full GMRES rely on the condition number
of the eigenbasis [74]. Other characterizations of the convergence of the full
GMRES rely on pseudospectra [52]. More commonly, the field of values is
used [8, 22, 23, 33, 61, 63, 82]. A discussion on how descriptive some of these
bounds are is given by Embree [24].
In the general case, for the full GMRES, the following theorem shows that
we cannot prove convergence results based only on the spectrum of the coefficient
matrix alone.
9


Theorem 2.2 (Greenbaum, Ptak, and Strakos, 1996, [34]) Given a non-
increasing positive sequence /(0) > /(1) > > f(n 1) > 0, there exists an
n-by-n matrix A and a vector r^ with ||r^|| = /(0) such that f(i) ||r^||,
i = 1,... ,n 1, where r^ is the residual at step i of the GMRES algorithm ap-
plied to the linear system Ax = b, with initial residual r^ bAxMoreover,
the matrix A can be chosen to have any desired (nonzero) eigenvalues.
This result states that, in general, eigenvalues alone do not determine the con-
vergence of the full GMRES. A complete description of the set of all pairs {A, b}
for which the full GMRES applied to (2.1) generates the prescribed convergence
curve while the matrix A has any (nonzero) eigenvalues, is given in [2].
In Section 2.2, we show that any admissible cycle-convergence behavior is
possible for restarted GMRES at a number of initial cycles, moreover the spec-
trum of the coefficient matrix alone does not determine this cycle-convergence.
The latter can be viewed as an extension of the result of Greenbaum, Ptak, and
Strakos, given by Theorem 2.2, for the case of restarted GMRES.
2.1 The sublinear cycle-convergence of GMRES(m) for normal
matrices
Throughout this section we assume (unless otherwise explicitly stated) A to
be nonsingular and normal, i.e., A admits the decomposition
A = VAV*, (2.5)
where A e Cnxn is a diagonal matrix with the diagonal elements being the
nonzero eigenvalues of A, and V Cnxn is a unitary matrix of the corresponding
eigenvectors. || || denotes the 2-norm throughout.
10


2.1.1 Krylov matrix, its pseudoinverse, and spectral factorization
For a given restart parameter m (1 of GMRES(m) applied to system (2.1), with the initial residual vector as
GMRES(A, m, We assume that the residual vector r^k\ produced at
the end of GMRES(A, m, is nonzero.
A run of GMRES(A, m, r^-1)) generates the Krylov subspace fCm (A, r^k~^)
given in (2.4). For each /Cm (A, r^fc_1^) we define a matrix
K (A,r{k~l)) = [r{k~1) Ar(fc_1) Amr{k~l)] G C"x(m+1), (2.6)
where k = 1,2,..., q, and q is the total number of GMRES(m) cycles.
Matrix (2.6) is called the Krylov matrix. We say that K (A, r^fc_1^) cor-
responds to the cycle GMRES(A, m, Ak~^). Note that the columns of
K (A, Afc-1)) span the next (m-l-l)-dimensional Krylov subspace /Cm+i(A, Afc-1)).
According to (2.3), the assumption r^ ^ 0 implies that r^k~^ cannot be ex-
pressed as a linear combination of vectors in AJCm (A, r^-1^). Thus, the matrix
K (A, Afe-1)) in (2.6) is of the full rank,
rank (A (A,r This equality allows us to introduce the Moore-Penrose pseudoinverse of
the matrix K (A,Afc_1)), i.e.,
K] (A,r(fc_1)) = (K* (A,r(fc_1)) K (A, r(fc~1)))~1 K* (A,r{k~l)) G C(m+1)xn,
which is well-defined and unique. The following lemma shows that the first
column of (A^ (A, r^-1)))* is the next residual vector r^ up to a scaling factor.
11


Lemma 2.3 Given A G Cnxn (not necessarily normal), for any k = 1,2,..., q,
we have
(2.7)
where e\ = [1 0 0]T G Rm+1.
Proof: See Ipsen [43, Theorem 2.1], as well as [17, 65].
Another important ingredient, first described in [43], is the so-called spectral
factorization of the Krylov matrix K (A,r^fe_1^). This factorization is made of
three components that encapsulate separately the information on eigenvalues of
A, its eigenvectors, and the previous residual vector
Lemma 2.4 Let A G Cnxn satisfy (2.5). Then the Krylov matrix K (A,Ak~^),
for any k = 1,2,... q, can be factorized as
K (A,r^k~^) = VDk_iZ, (2.8)
where dk-i V*r(k~^ G Cn, Dk-i diag((4_i) G Cnxn, and Z G Cnx^m+1^ is
the Vandermonde matrix computed from the eigenvalues of A,
Z = [e Ae Ame], (2.9)
e = [1 1 l]rGP.
12


Proof: Starting from (2.5) and the definition of the Krylov matrix (2.6),
K (A, r^-1)) = Ar^k~1'1 J4mr(fc_1)j
= [l/R*r(fe_1) VAV*r{k~l) VAmV*r{k~1)]
= V [dk-i Adk-i Amdfc_i]
= V [Dk-ie ADk-ie AmDk~ie]
= VDk-X[e Ae A me\ = VDk^Z.
d
It is clear that the statement of Lemma 2.4 can be easily generalized to
the case of a diagonalizable (nonnormal) matrix A providing that we define
dk-1 = in the lemma.
2.1.2 The sublinear cycle-convergence of GMRES(m)
Along with (2.1) let us consider the system
= b (2.10)
with the matrix A replaced by its conjugate transpose. Clearly, according
to (2.5),
A* = VAV*. (2.11)
It turns out that m steps of GMRES applied to systems (2.1) and (2.10)
produce residual vectors of equal norms at each stepprovided that the initial
residual vector is identical. This observation is crucial for proving the sublinear
cycle-convergence of GMRES (m) and is formalized in the following lemma.
13


Lemma 2.5 Assume that A £ Cnxn is a nonsingular normal matrix. Let r^
and f be the nonzero residual vectors obtained by applying m steps of GMRES
to systems (2.1) and (2.10); 1 < m < n 1. Then
provided that the initial approximate solutions of (2.1) and (2.10) induce the
same initial residual vector r^.
Moreover, if pm{z) and qm(z) are the (GMRES) polynomials which mini-
mize, respectively, ||p(j4)r()|| and ||p(A*)r^|| over p(z) £ Vm, where Vm is the
set of all polynomials of degree at most m defined on the complex plane such that
p(0) = 1, then
Pm(z) = Qm(z),
where p(z) denotes the polynomial obtained from p(z) £ Vm by the complex
conjugation of its coefficients.
Proof: Let us consider a polynomial p(z) G Vm. Let r^ be a nonzero
initial residual vector for systems (2.1) and (2.10) simultaneously. Since the
matrix A is normal, so is p(A)\ thus p(A) commutes with its conjugate transpose
p*(A). We have
||p(j4)r<>||2 = (p{A)r^,p{A)rw) = {r{0\p*(A)p(A)rw)
= (rm,p(A)p*{A)r{0)) = (p*(A)r{0\p*{A)rW)
= {(Vp(A)V*)* r(0), (Vp(A)V*)* r(0))
= (Vp(K)V*rw,Vp(K)V*rw)
= (p(VAV*)r{0\p(VAV*)rw) = ||p(PAW>(0)||2,
14


where p(z) is the polynomial obtained from p(z) by conjugating its coefficients.
By (2.11) we conclude that
l|pM)r(0>|| = ||pM><0>||.
We note that the last statement is true for any polynomial p, for any r0, and for
any normal A.
Now, let us look at ||r^|| and ||f^||. On the one hand,
||r(m)|| = min ||p(A)r(0)|| = ||pm(A)r{0)|| = ||pm(A*)r(0)||
pEPm
> min ||p(v4*)r^|| = min ||p(J4*)r^|| = ||r^||.
pVm peVm
On the other hand,
||f(m)|| = min ||p(A*)r{0)|| = ||gm(v4*)r(0)|| = || P&Vm
> min ||p(.4)r^0)|| = min ||p(.4)r^|| = ||r^||.
peVm pePm
Combining both results, we conclude that
l|r(m)|| = r-ll,
which proves the first part of the lemma.
To prove the second part of the lemma, we consider the following equalities:
||gTO(A*)r(0)!| = min ||p(/l*)r(0)|| = ||f(m)|| = ||r(m)|| =. min ||p(.4)r(0)||
peVm peVm
By uniqueness of the GMRES polynomial [36, Theorem 2], we conclude that
Pm(*0 = Qm(z).
15


The previous lemma is a general result for the full GMRES, which states
that, given a nonsingular normal matrix A and an initial residual vector r^\
GMRES applied to A with produces the same convergence curve as GMRES
applied to A* with r^0). In the framework of restarted GMRES, Lemma 2.5
implies that the cycles GMRES(A, m, and GMRES(A*, m, result
in, respectively, residual vectors and f^ that have the same norm.
Theorem 2.6 (the sublinear cycle-convergence of GMRES(m))
Let {r(fc)}^_0 be a sequence of nonzero residual vectors produced by GMRES(m)
applied to system (2.1) with a nonsingular normal matrix A Cnxn, 1 < m <
n 1. Then
||r(fc+i)|
(fc)i
<
k = 1.
(2.12)
||^.(fc1)|| |||.(fc)||
Proof: Left multiplication of both parts of (2.7) by K* (.4, ?4fc-1)) leads to
1
ei =
|r(k) I)'
;K* (A,r(fe-1})
(fc-m _(fe)
By (2.8) in Lemma 2.4, we factorize the Krylov matrix K (.4,r^ in the
previous equality:
ei
1
,rW||2
1
|| rtfc) |
;Z*Dk-ldk-
Applying complex conjugation to this equality (and observing that e\ is real),
we get
ei =
1
;ZT Dk-idk-
||r(fc)||2
According to the definition of Dk-i in Lemma 2.4, D^-idk Dkdk-i, thus
\\r(k)\\2ZTDkdk-1 = ||r(fc)||2 (ZTDkV*) r(k 1} = ||r(fc)||2 DkZ)
ei
(fc-i)
16


From (2.8) and (2.11) we notice that
K (A\r{k)) = K (VAV\r{k)) = VDkZ,
and so therefore
ei = t4'-r' <2-13)
Considering the residual vector r^-1^ as a solution of the underdetermined
system (2.13), we can represent the latter as
r(fc-D = ||r(fc)||2 (K* (A*,r(fc)))te1 + wk, (2.14)
where wk G null (K* (A*, r(fc^)). We note that since /4fe+1) is nonzero (as-
sumption in Theorem 2.6), the residual vector f^k+l^ at the end of the cycle
GMRES(j4*, m, r^) is nonzero as well by Lemma 2.5; hence the corresponding
Krylov matrix K (A*,r^) is of the full rank, and thus the pseudoinverse in
(2.14) is well defined. Moreover, since
wk(K* (T*,r))fei,
using the Pythagorean theorem we obtain
||r(fc-1>||2= ||r(fc)||4|| (K* (A*,rW))te1||2 + |K||2.
Now, since (K*(A*, r(fc)))f = (Kf(A*, r(fc))*), we get
||r(^)||2 = ||r||4|| (K* (A\r^))* ei\\2 + |M|2,
and then by (2.7),
llr(fc)ll4
j|f(fc+l)||2

>
||rW ||4
||f(fc+l) ||2
17


Figure 2.1: Cycle-convergence of GMRES(5) applied to a 100-by-100 normal
matrix.
where is the residual vector at the end of the cycle GMRES(A*, m, r^).
Finally,
||r(fc)||2 ||r(fc)||2||^(A-+l)||2 ||f(fc+l)||2
||r(fc-l)||2 - ||rW||4 = ||r(*)||2
so that
4k) \
4k+1)1
llr(fc-l)ll llr(fc)ll (2-15)
By Lemma 2.5, the norm of the residual vector f^k+1'> at the end of the cycle
GMRES(j4*, m, r^) is equal to the norm of the residual vector at the
end of the cycle GMRES(2l, m, r^), which completes the proof of the theorem.
Geometrically, Theorem 2.6 suggests that any residual curve of a restarted
GMRES, applied to a system with a nonsingular normal matrix, is nonincreasing
and concave up (Figure 2.1).
Corollary 2.7 (cycle-convergence of GMRES(m)) Let ||r^|| and |jr-G)j|
be given. Then, under assumptions of Theorem 2.6, norms of the residual vectors
18


(2.16)
r^ at the end of each GMRES(m) cycle satisfy the following inequality
Proof: Directly follows from (2.12).
Inequality (2.16) shows that we are able to provide a lower bound for the
residual norm at any cycle k > 1 after performing only one cycle of GMRES(ra),
applied to system (2.1) with a nonsingular normal matrix A.
Prom the proof of Theorem 2.6 it is clear that, for a fixed k, the equality
in (2.12) holds if and only if the vector in (2.14) from the null space of the
corresponding matrix K* (A*, r^) is zero. In particular, if the restart parameter
is chosen to be one less than the problem size, i.e., m n 1, the matrix
K* (A*, r^) in (2.13) becomes an n-by-n nonsingular matrix, hence with a zero
null space, and thus inequality (2.12) is indeed an equality if m = n 1.
We now show that the cycle-convergence of GMRES(n 1), applied to
system (2.1) with a nonsingular normal matrix A, can be completely determined
by norms the of the two initial residual vectors r(0^ and
Corollary 2.8 (the cycle-convergence of GMRES(n 1)) Let us suppose
that ||r^|| and ||r^)|| are given. Then, under the assumptions of Theorem 2.6,
norms of the residual vectors r^ at the end of each GMRESfn 1) cycle obey
the following formula:
(2.17)
Proof: Representation (2.14) of the residual vector r^k ^ for m = n 1
turns into
(2.18)
19


implying, by the proof of Theorem 2.6, that the equality in (2.12) holds at each
GMRES(n 1) cycle. Thus,
r(fc+i)
k = 11.
We show (2.17) by induction in k. Using the formula above, it is easy to
verify (2.17) for ||r(2)|| and jjr^3^[j (k 1,2). Let us assume that for some k,
3 < k < q 1, ||r(fc-1)|| and ||r(fc)|| can also be computed by (2.17). Then
(fc+i)ll iir(U
lu +1 II
\\rW\\
||r(fc-i)||
||r
= ilra)|i (O)*'1 (JLLi) = 11^)11,
11 "lir>n; l|r vtir
(i)i
.(i)l
fc-i
(\\r^\\\k 1
-(0)1
iir<1)n (IKS|)2
(i)|
Thus, (2.17) holds for all k = 1,..., q 1.
Another observation in the proof of Theorem 2.6 leads to a result from
Baker, Jessup, and Manteuffel [6]. In this paper, the authors prove that, if
GMRES(n 1) is applied to a system with Hermitian or skew-Hermitian matrix,
the residual vectors at the end of each restart cycle alternate direction in a cyclic
fashion [6, Theorem 2]. In the following corollary we (slightly) refine this result
by providing the exact expression for the constants ak in [6, Theorem 2],
Corollary 2.9 (the alternating residuals) Let {t^}^=0 be a sequence of
nonzero residual vectors produced by GMRES(n 1) applied to system (2.1)
with a nonsingular Hermitian or skew-Hermitian matrix A G Cnxn. Then
ll (fc+i)||2
r<+1> = atr 20


Proof: For the case of a Hermitian matrix A, i.e., A* = A, the proof
follows directly from (2.7) and (2.18).
Let A be skew-Hermitian, i.e., A* = A. Then, by (2.7) and (2.18),
r(*-i) = (K* (A*,rw)) 1e1 = (K* (-A,r(fe))) 1 ex =
||r(fc)||2
if^+bip5
?(*+!)
where f(fe+1) is the residual at the end of the cycle GMRES(A, n 1, r^).
According to (2.3), the residual vectors Afc+1) and at the end of the
cycles GMRES(A, n 1, r^) and GMRES(A, n 1, r^) are obtained by
orthogonalizing r^ against the Krylov residual subspaces AK.n-\ (A, r^) and
(A) fCn-i (A, r^), respectively. But (A)/Cn_i (A,r^) = AKn-\ (A,r
and hence f^'+1^ = Ak+1\
In general, for systems with nonnormal matrices, the cycle-convergence be-
havior of the restarted GMRES is not sublinear. In Figure 2.2, we consider
a nonnormal diagonalizable matrix and illustrate the claim. Indeed, for non-
normal matrices, it has been observed that the cycle-convergence of restarted
GMRES can be superlinear [81].
In this concluding subsection we restrict our attention to the case of a diag-
onalizable matrix A,
A = VKV~\ A* = V~*AV*.
(2.20)
The analysis performed in Theorem 2.6 can be generalized for the case of a
diagonalizable matrix [79], resulting in inequality (2.15). However, as we depart
from normality, Lemma 2.5 fails to hold and the norm of the residual vector f(fc+1)
at the end of the cycle GMRES(A*, m, r(fe)) is no longer equal to the norm of the
vector r(fc+1) at the end of GMRES(A, m, r(fc)). Moreover, since the eigenvectors
21


Residual Curve
Rate of Convenience Curve
10'
10 15 20
GMRES(m) cycle number
25
30
10 15 20
GMRE$(m) cycle number
25
30
Figure 2.2: Cycle-convergence of GMRES(5) applied to a 100-by-100 diago-
nalizable (nonnormal) matrix.
of A can be significantly changed by transpose-conjugation, as (2.20) suggests,
the matrices A and ^4* can have almost nothing in common, so that the norms of
p(/c+i) ancj r(k+i) ar6j possibly, far from being equal. This creates an opportunity
to break the sublinear convergence of GMRES(m), provided that the subspace
AJCm (A, rresults in a better approximation (2.3) of the vector r^ than the
subspace A*K,m (A*,r^).
It is natural to expect that the convergence of the restarted GMRES for al-
most normal matrices will be almost sublinear. We quantify this statement
in the following theorem.
Theorem 2.10 Let 0 be a sequence of nonzero residual vectors produced
by GMRES(m) applied to system (2.1) with a nonsingular diagonalizable matrix
A E Cnxn as in (2.20), 1 < m < n 1. Then
where a = crm^n(R), pk = \\pk{A)(I VV*)r{kl\\, pk(z) is the polynomial con-
structed at the cycle GMRESfA, m, r(fc)), and where q is the total number of
(2.21)
22


GMRES(m) cycles. We note that 0 < a > 1 and 0 < /?*. 0 as V*V > I.
Proof: Let us consider the norm of the residual vector f^+1^ at the end of
the cycle GMRES(A*, m, r^). Then we have
||f(fe+1)|| = min ||p(A*)r(fe)|| < ||p(A*)r(fc)||,
peVm
where p(z) Vm is any polynomial of degree at most m such that p(0) = 1.
Then, using (2.20),
||r(fc+1)|| < ||p(A*)r(fc)||
= ||p->(A)VrV(fc)||
= ||P'XA)(p-1y)P*r(fc)||
= ||P-*p(A)p-:(PP*)r(fc)||
= \\V~*p(H)V~1(I -(I- 1/Vr*))r(fc)||
- ||V-*p(A) (V~lrk V~\I PP*)rW) ||
< ||V||||p(A) (y-1^ V~\l VV*)rw) ||.
We note that
||p(A) (y-V(fc) V~\I VV*)r{k)) II = ||p(A) (y-V(fc) V~\I VV*)rW) ||.
Thus,
||f(fc+1)|| < ||y-*||||p(A) (y-V(fc) v~\i vv*)r{k)) ||
= l|v-i||(y_1y)p(A) (y-Vfc) v~\i vv*)r^) y
< IIV IIIIV-1 IIII^(A)!^-1^ Vp(A)V-\l VV*)r(fc)ll
= -p(vhv-')(i vvVn
min \ /
< -T^rrr, (IIp(^)>-,II + \\p(A)(I W)r<)||),
23


where amin is the smallest singular value of V.
Since the last inequality holds for any polynomial p(z) Vm, it also holds for
p(z) = pk(z), where Pk(z) is the polynomial constructed at the cycle GMRES(A,
m, r^). Hence,
Illll < -Y^TTTs (li>-(l+1)ll + IW>t)U W>)ll)
min V )
Setting a = 2 1(vj, Pk = \\pk(A)(I VT/*)r^||, and observing that a > 1,
6k > 0 as V*V > /, from (2.15) we obtain (2.21).
2.2 Any admissible cycle-convergence behavior is possible for the
restarted GMRES at its initial cycles
In the previous section, we have characterized the cycle-convergence of the
restarted GMRES applied to system of linear equations (2.1) with a normal
coefficient matrix. Now we turn our attention to the general case. The main
result of the current section is stated as the following
Theorem 2.11 Let us be given an integer n > 0, a restart parameter m (t) <
m < n), and a positive sequence {f(k)}9k=0, such that /(0) > /(1) > >
f(s) > 0, and f(s) = f(s + 1) = ... = f(q), where 0 < q < n/m, 0 < s < q.
There exists an n-by-n matrix A and a vector r^ with ||r^0^ j| = /(0), such that
||r(fc)|| __ f(k), k = 1 ,...,q, where r^ is the residual at cycle k of restarted
GMRES with restart parameter m applied to the linear system Ax b, with
initial residual r^ = b Ax^. Moreover, the matrix A can be chosen to have
any "desired (nonzero) eigenvalues.
The full GMRES has a nonincreasing convergence for any i > 0, f(i) >
f(i + 1) and it computes the exact solution in at most n steps (f(n) = 0).
24


We note that the assumptions on {f(k)}k~l in Theorem 2.2 do not cover the
class of convergence sequences corresponding to the convergence to the exact
solution before step n. One can see, however, that these assumptions axe suffi-
cient to conclude that the theorem holds in this case as well. In this sense it is
remarkable that Greenbaum, Ptak, and Strakos are able to prove that any ad-
missible convergence behavior is possible for the full GMRES at its n steps. At
the same time we would like to note that the cycle-convergence of the restarted
GMRES can have two admissible scenarios: either f(i) > f(i + 1) for any i,
in other words, the cycle-convergence is (strictly) decreasing; or there exists s
such that f(i) > f(i + 1) for any i < s, and then f(i) = f(s) for any i > s,
in other words, if the restarted GMRES stagnates at cycle s + 1, it stagnates
forever. Thus assumptions on {f(k)}gk==0 in Theorem 2.11 reflect any admissible
cycle-convergence behavior of restarted GMRES at the first q cycles, except for
the case where the convergence to the exact solution happens within these q
cycles. It turns out that the assumptions are sufficient to guarantee that The-
orem 2.11 also holds in the above mentioned case of early convergence. In
Subsection 2.2.6, we point out how exactly the assumptions of Theorem 2.2 and
Theorem 2.11 allow us to conclude that any admissible convergence behavior is
possible for the full and restarted GMRES (at its q initial cycles).
As mentioned above, the maximum number of iterations of the full GMRES
is at most n, and the method delivers the exact solution in a finite number of
steps. The restarted GMRES, however, may never provide the exact solution.
It (hopefully) decreases the residual norm at each cycle, that is, provides a more
and more accurate approximation to the exact solution. With n2 parameters
25


in A and n parameters in b we are not able to control the convergence for an
infinite amount of cycles. For this reason, we consider only the first q < n/m
initial GMRES(m) cycles. We note that, in practice, n >> m so q is relatively
large.
The rest of this section concerns the proof of Theorem 2.11. The proof we
provide is constructive and directly inspired by the article of Greenbaum, Ptak,
and Strakos [34]. Although Greenbaum, Ptak, and Strakos laid the path, there
are several specific difficulties ahead in the analysis of the restarted GMRES.
Let n be a matrix order and m a restart parameter (ra < n), A =
{Ai, A2,... A} cC\{0} be a set of n nonzero complex numbers, and {f(k)}qk=0
be a positive sequence, such that /(0) > /(1) > > f(s) > 0 and
f(s) = f(s + 1) = ... = f(q), where 0 < q < n/m, 0 < s < q.
In this section we construct a matrix A Crtx and an initial residual vector
r-() = b Ax^ e C" such that GMRES(m) applied to system (2.1) with the
initial approximate solution produces a sequence {xof approximate
solutions with corresponding residual vectors {r^}k=0 having the prescribed
norms: ||r^|| = f(k). Moreover the spectrum of A is A.
For clarity, we first restrict our attention to the case of the strictly de-
creasing cycle-convergence, and, in Section 2.2.2, prove Theorem 2.11 under the
assumption that /(0) > /(1) > > f(q) > 0 (i.e., we assume that s = q).
Next, in Section 2.2.3, we complete the proof of Theorem 2.11 by handling
the (remaining) case of stagnation, i.e., /(0) > /(1) > < > f(s) > 0 and
f(s) = f(s + l) = ... = f(q),0 proof for the considered case of the strictly decreasing cycle-convergence.
26


2.2.1 Outline of the proof of Theorem 2.11
The general approach described in this paper is similar to the approach of
Greenbaum, Ptak, and Strakos [34]: we fix an initial residual vector, construct
an appropriate basis of Cn, and use this basis to define a linear operator A.
This operator is represented by the matrix A in the canonical basis. It has the
prescribed spectrum and provides the desired cycle-convergence at the first q
cycles of GMRES(m). However, the presence of restarts somewhat complicates
the construction: the choice of the basis vectors, as well as the structure of the
resulting operator A, becomes less transparent. Below we briefly describe our
three-step construction for the case of the strictly decreasing cycle-convergence
and then suggest its easy modification to prove the general case, which includes
stagnation.
At the first step we construct q sets of vectors = {w[k\..., wl^},
k = 1,..., g, each set Wm is the orthonormal basis of the Krylov residual
subspace AK,m (A,r^_1^) generated at the k-th GMRES(m) cycle such that
span Wf]1 = AtCj (A, r(fc_1)) j = 1,..., m. (2.22)
The orthonormal basis W needs to be chosen in order to generate residual
vectors r^ with the prescribed (strictly decreasing) norms f(k) at the end of
each cycle subject to the additional requirement that the set of mq + 1(< n)
vectors
/r() ,(!) r(i) ,(2)
V > a l wm-D ' > W1 > j wm_1, . I ,
,(?)
> wm\i ' /
(2.23)
is linearly independent.
27


Once we have the set S, we will complete it to have a basis for Cn. If the
number of vectors in S is less than n, a basis S of C" is obtained by completion
of S with a set S of n mq 1 vectors, i.e., S = {5, representation of C" as the direct sum
Cn = span S = span{r^, w£i:} span{r^_1\ span{r^, (2.24)
The latter translates in terms of Krylov subspaces into
C" = span S = K-m (A, r^) Km (A, span{r^, At the second step of our construction, we define a linear operator A :
Cn > C" with spectrum A which generates the Krylov residual subspaces in
(2.22) at each GMRES(m) cycle, by its action on the basis vectors S, such that
the desired matrix A is the operator As representation in the canonical basis.
The third step accomplishes the construction by a similarity transformation.
In the following subsection we show that this three-step approach indeed
allows us to prove Theorem 2.11 in the case of a strictly decreasing positive
sequence {f(k)}gk_Q. In order to deal with the particular case of stagnation, i.e.,
/(0) > /(1) > > f(s) > 0 and f(s) = f(s + 1) = ... = /( same framework but set q = s + 1 and redefine the vector (r^ is the last
vector in (2.23)). More details are provided in Subsection 2.2.3.
2.2.2 Proof of Theorem 2.11 for the case of a strictly decreasing
cycle-convergence
Throughout this subsection we let the positive sequence {f(k)}qk=0 only to be
strictly decreasing. We also assume here that q = max {z e Z : z < n/m}. This
28


means that for the given n and m we perform our construction along the largest
number of initial cycles where we are able to determine A (having a prescribed
spectrum) and r^ which provide the desired cycle-convergence. Although our
proof is formally valid for any 0 < q < n/m, the assumption emphasizes the
extent to which we can take control over the process. We note that any case
with q < max {z 6 Z : z < n/m} can be extended to the one assumed above by
properly defining a number of additional elements in {f(k)}9k=z0.
Step 1: Construction of a sequence of Krylov subspaces which provide
the prescribed cycle-convergence
At the kth GMRES(m) cycle, the residual vector r^ satisfies minimality
condition (2.3). We assume that each set Wm* is an orthonormal basis of a cor-
responding Krylov residual subspace A}Cm (A,r(k~1'>), therefore condition (2.3)
implies
r(k) = r(k-1) ^(r(fc-i)>w^)wf\ k=l,...,q. (2.25)
j'=i
At this stage, in order to simplify the forthcoming justification of the linear
independence of the set S, we impose a stricter requirement on the residual
change inside the cycle. We require that the residual vector remains
constant during the first m 1 inner steps of GMRES and is reduced only at
the last, mth, step. Thus, the equality in (2.25) can be written as
r(k) =r(k-i) _ k = l,...,q. (2.26)
29


This implies that the vectors wf\ j = 1, are orthogonal to the
residual vector i.e.,
(r(fc_1\ Wjk^) = 0, j = 1,... ,m 1, k=l,...,q. (2.27)
Prom (2.26), using the fact that r^ J. w$ and the Pythagorean theorem,
we obtain
|(r<*-1>,<))| = A/||r(*-0||2-||r(*)||2,
By defining (acute) angles ipk = /(r^-1\ w$) and the corresponding cosines
cos uy = ; we can equivalently rewrite the identity above in the fol-
lowing form:
cos ipk
!u.
Vf(k-ir-mr
nk-i)
(0,1), k=l,...,q,
(2.28)
where f(k 1) and f(k) are the prescribed values for the norm of the residual
vectors and r^, respectively. Thus, if we are given one way to
ensure the desired cycle-convergence at cycle k of GMRES(ra) is to choose the
unit vectors such that (2.26)-(2.28) holds.
In the following lemma, we show constructively that the described approach
(2.26)-(2.28) leads to an appropriate set S.
Lemma 2.12 Given a strictly decreasing positive sequence {f{k)}k=0 and an
initial vector r^\ ||r^|| = /(0), there exist vectors r^k\ ||r(fc)|| = f(k) and
orthonormal sets Wm^ such that (2.26), (2.27), and (2.28) hold, and the set S
in (2.23) is linearly independent, k = 1,... ,q.
30


Proof: The proof is by induction. Let k = 1. Given the initial vector r^\
llr^l| = /(0), we pick = {ic^, ,^m-i) an orthonormal set in r^ in
order to satisfy equalities (2.27). The set is linearly independent.
In order to choose the unit vector orthogonal to the previously con-
structed vectors and satisfying (2.28), we introduce a unit vector 6
{r(\ so that
w,
(i) _
~(o)
m
cos-01 + f/^sim/q.
We find the vector rb) by satisfying (2.26). Equality (2.28) guarantees that
llr^H = /(1), as desired. Finally, we append the constructed vector rb) to
{r(), an(j get ge{. |r()) W^!, r(i)}, which is linearly independent,
since, by construction, r^1) is not in span{r^\
The induction assumption is that we have already constructed k 1 vectors
r^\... with the prescribed norms /(l),...,/(fc 1) and orthonormal
sets , Wm_1\ such that equalities (2.26), (2.27) and (2.28) hold, and
the set
/r() yy(1). r(k-2) yp(fe-0 r(k-i)\
> vm1 > ' > Ym1 ) ' J
(2.29)
is linearly independent. We want to show that we can construct the next vec-
tor r^k\ ||r^|| = f{k), and the orthonormal set Wm\ satisfying (2.26), (2.27)
and (2.28), such that
/r() yy(1), r(fe_2) W(fcT1} r(fc_1) W(fc\ r(fe)}
X > vm\i i 1 i vm 1 > ' i y vm 1 > ' /
(2.30)
is linearly independent, k < q.
We start by constructing orthonormal vectors W^li = {w[k\ ...,
(k)
satisfying (2.27), with the additional requirement that the set is not
31


in the span of the previously constructed vectors given in set (2.29). From
these considerations, we choose W,^ as an orthonormal set in the orthogonal
complement of (2.29), i.e.,

,r
(fc-2) yV(fc-1)
i vm-1 ' S i
j = l,...,m- 1.
Appending to the set (2.29) gives a linearly independent set.
To finish the proof, we need to construct the vector Wm\ satisfying (2.28)
and orthogonal to For this reason we introduce a unit vector y^k\
y
(k)
e (r(0) W(1), r(fc-2) r(fe_1)
c l' y vm 1 ) i vm1 > ' i Vm\S >
(fc-1)
>(*)
so that,
w
(*) JL_____________
f(k -1)
cos^fc + ?/(fc)sin^fc,
where cos ipk = 1(r<|*r2^))|
We define the vector with (2.26). Equality (2.28) guarantees ||r^|| =
f(k). Set (2.30) is linearly independent, since, by construction, the vector r
is not in span{r^\ .... r^fc_2\ W^lj}.
Step 2: Definition of a linear operator with any prescribed spectrum
So far we have shown that, given a strictly decreasing positive sequence
{/mu an<^ an i^ial residual vector r^\ ||r^|| = /(0), it is possible to con-
struct vectors r^k\ ||r^|| = f(k), and orthonormal vectors Wm\ k = 1
satisfying equalities (2.26), (2.27) and (2.28), such that the set S of mq + 1
vectors in (2.23) is linearly independent.
In order to define a (representation of) unique linear operator, we need
to have a valid basis of Cn at hand. Thus, we expand the set S by linearly
32


independent vectors assumed that q = max {2 6 Z : 2 < n/m}):

(g-i) v\/?) Art
(2.31)
so that S is a basis of Cn.
Before we define a linear operator A, let us consider the set
A {Al5 A2,.. , A}
of nonzero numbers in the complex plane that defines the spectrum of A. We
split A into q+ 1 disjoint subsets
A = {Ai, A2,..., Ag, A9+i}, (2.32)
such that each A^, k = 1contains m elements of A, and the remaining
n mq elements are included into A9+i-
For each set A/-, k = 1,..., q, we define a monic polynomial Pk{x), such that
the roots of this polynomial are exactly the elements of the corresponding A/.:
m1
pk(x) = xm ^2 af] x3, k = l,...,q, (2.33)
i=0
with being the coefficients of the respective polynomials, 7^ 0- Each
polynomial Pk{x) in (2.33) can be thought of as the characteristic polynomial of
an m-by-m matrix with spectrum Ak-
Let us also introduce an arbitrary (t + l)-by-(t + 1) matrix C with the
spectrum Ag+1:
C= 0%), A(C) = Ag+1, i,j l,...,t + l = n mq. (2.34)
33


We define the operator A : C71
Cn as follows:
Ar^k ^w[k\
Aw^ = w^\
(k) (k)
w
m1 >
^m-2
(fc) (fc)
tj w\
Ar{g) = 0n riq) + (32iSi + + A+1,1 Si,
Asi = + P22S1 + + Pt+l,2Sti
Aw!£L 1 = -a{0k)r(k) + a(0k)r(k 1} + a\k)w[k) H----------h a^w^, k = l,...q\
(2.35)
Ast Pl,t+lT^ + ,$2,t+lsl + + fit+l,t+lst)
where 0^'s are the coefficients of polynomials (2.33) and /5ys are the elements
of the matrix C in (2.34).
The following lemma showrs that, given vectors and orthonormal sets
constructed according to Lemma 2.12, the linear operator A, defined by
(2.35) and represented by a matrix A in the canonical basis, generates the desired
Krylov residual subspaces given in (2.22); and the spectrum of A can be chosen
arbitrarily.
Lemma 2.13 Let the initial residual vector ||r^|| = /(0), as well as
the residual vectors r^ and orthonormal sets be constructed according to
Lemma 2.12. Let S be the basis ofCn as defined by (2.31) and A be an arbitrary
set ofn nonzero complex numbers. Then the linear operator A defined according
to (2.32)-(2.35) generates the Krylov residual subspaces given in (2.22), where
34


the matrix A is a representation of A in the canonical basis. Moreover, the
spectrum of A is A.
Proof: From definition (2.35) of the linear operator A one can notice that
Ar{-k~l'> = w[k\
AVk~V = w{k\
Am-lr(k-l)=w(k)^
Amr(k-1) _Q,Wr(fe) _|_ +-----h k 1,.. .q.
Since, by construction in Lemma 2.12, i.e., equality (2.26),
0 7^ a^r^ + span{u>^},
the above relations immediately imply that for each k 1,... q,
span{>lr^ , AM^ ^} = span j = 1,...,
m.
Thus, given the representation A of the linear operator A in the canonical basis,
we have proved that A generates the Krylov residual subspaces given in (2.22).
35


To prove that an arbitrarily chosen set A can be the spectrum of A, let us
consider the matrix [Xj5 of the operator A in the basis S (see (2.31) and (2.35)):
00-- 0 o1}
10-- 0 a[^
01 0 a(2l)
00-- 1 a(1) 1 am-1
-q(1) ao
Ms
' t
(9-1)
0 0 0 a.
(9)
0
(2.36)
1 0 0
01--
(9)
0 a?>
0
0 0 la
(9)
m 1
-a.
(9)
fin
fii;
t+i
fit+1,1 fit+l,t+1
The matrix (y4.]5 has a block lower triangular structure, hence the spectrum
of [*4]5 is a union of eigenvalues of all diagonal blocks. The first q blocks are
the companion matrices corresponding to the sets A*, k 1,... ,q, with char-
acteristic polynomials defined in (2.33). The last block is exactly the matrix
C (/3ij) from (2.34) with the spectrum A9+i. Thus, given partition (2.32) of
the set A, we conclude that spectrum of A is A.
36


Step 3: Conclusion of the proof of Theorem 2.11 for the case of
the strictly decreasing cycle-convergence
Finally, we define A as the representation of the operator A in the canonical
basis £ = {ei,e2, . ,en},
A = S[A]sS~l, (2.37)
where the square matrix S is formed by the vectors given in (2.31) written
as columns and [.A]^ is defined by (2.36). The constructed matrix A pro-
vides the prescribed (strictly decreasing) norms of residual vectors at the first
q GMRES(ra) cycles if starting with and its spectrum is A. We note that
the field over which the resulting matrix is defined depends heavily on the par-
tition (2.32) of the set A, e.g., A turns out to be (non-Hermitian) complex if a
conjugate pair from A is not included into the same subset Ak.
2.2.3 Extension to the case of stagnation
In the previous subsection, we have proved Theorem 2.11 only for the case
of the strictly decreasing positive sequence {f(k)}qk=0. Now, in order to conclude
the rest of Theorem 2.11, we consider the case of stagnation: /(0) > /(1) >
> f(s) > 0 and f(s) f(s + 1) = ... = f(q). The latter fits well (after a
minor modification) into the framework presented above.
Let us set q s + 1 and, without loss of generality, reduce the problem to
constructing a matrix A with a spectrum A and an initial residual vector A\
||r(0)|| = /(0), for which GMRES(ra) produces the following sequence of residual
norms: /(0) > /(1) > > f(q 1) = f(q) > 0. We observe that the sequence
is strictly decreasing up to the last cycle q. Thus, by Lemma 2.12, at the initial
37


q 1(= s) cycles we are able to construct sets W and vectors r^k\ such that
||r^|| = /(&) and the set
{r(0), ..., r(9_1)} (2.38)
is linearly independent. Then, formally following the construction in Lemma 2.12
at the cycle q, we get orthonormal set Wm* from the orthogonal complement
of (2.38) and the residual vector This leads to set (2.23), which is
no longer linearly independent due to the above mentioned equality of residual
vectors. To enforce the linear independence we substitute in (2.23) the incon-
venient vector by Wm + r^-1) and obtain the set
/r() yyd) r(q-2) yp(9-b r(9-i) yp^) ^(9) + r(9-i)\
l' > vm 1) > ' ) vm 1 > ' ) r vm 1 > wm 1 i
i)
(?) ,,,(9) _l_ -(9-1)1
(2.39)
which is linearly independent, due to the fact that the orthonormal set Wm is
chosen, by construction, from the orthogonal complement of (2.38).
The rest of the proof exactly follows the pattern described in Subsection 2.2.2
with replaced by Wm + q = s + 1; see (2.31)(2.37). The resulting
matrix A has the prescribed spectrum A and with the initial residual vector
r(), ||r()|| = /(0), provides the desired cycle-convergence of GMRES(ra) with
a stagnation starting after cycle s.
This concludes the proof of Theorem 2.11. In what follows we suggest several
remarks and generalizations related to the result.
2.2.4 Difference with the work of Greenbaum, Ptak, and Strakos [34]
For the reader familiar with the work of Greenbaum, Ptak, and Strakos [34],
it might be tempting to obtain the present result by pursuing the following
scheme: fix and then consider the first restarted GMRES cycle as the initial
38


part of a full GMRES run where the convergence is prescribed for the first m it-
erations (and set arbitrarily for the remaining n m iterations). Then, similarly,
given the starting residual vector r^ provided by this first cycle, construct the
next Krylov residual subspace, which provides the desired convergence following
the scheme of Greenbaum, Ptak, and Strakos [34]. Proceed identically for the
remaining cycles. This approach, however, does not guarantee the linear inde-
pendence of the set S in (2.23) and, hence, one meets the problem of defining
the linear operator A. These considerations have been the reason for assump-
tions (2.26), (2.27) on the residual reduction inside a cycle, which have allowed
us to quite easily justify the linear independence of the set S and to control the
spectrum, as well.
2.2.5 Generating examples with nonzero r9+1
We note from definition (2.35) of the operator A in Subsection 2.2.2 that
spanjr^, si,..., s)} is an invariant subspace of A and, hence,
rWGAKw(4,r<)),
where A is the representation of the operator A in the canonical basis and
t n mq 1 (< m, by the assumption that q = max {z e Z : z < n/m} at
the beginning of Subsection 2.2.2). This implies that at the end of the (q + l)st
cycle GMRES(m) converges to the exact solution of system (2.1), i.e., r^9+1^ = 0.
This fact might seem unnatural and undesirable, e.g., for constructing theoretical
examples. The drawback, however, can be easily fixed by a slight correction
of the basis S in (2.31)somewhat similarly to how we handled the stagnation
case in Theorem 2.11.
39


Given residuals r^ and orthonormal sets W constructed according to
Lemma 2.12, instead of considering the set S, we consider the following basis of
C":
S = {r(0\ ..., ,r(g 1),w\Q>,...,w'£>_1,r(q) + ^r(q 1J,si,... ,st},
(2.40)
where 7 ^ 1,0. Here we substituted the basis vector in (2.31) by +
7r(9_1). The vector cannot be represented as a linear combination of
other vectors in 5, since it contains the component r(q\ which is not represented
by these vectors. Hence, S is indeed a basis of Cn. Thus we can define the
operator A by its action on S:
,(<7-1) ,( (?) A Ar(k 1'> = w\k\
Aw 1
(k)
w.
(k)
2 >
Aw
(k) (fc)
m-2 = wm-1.
(*) - a. (*).(*-!) ^ (*),(*)
Aw^i, = ctQJr() + aoV + +
k = 1,... ,q 1;
, (k) {k)
40


Ar^q 1') = w(f\
Awf* w^\
A{r{q)
Aw{£_ 2 - ?n(9) wm-1>
_ (9)
^m-1 II ^ 1 + oC .(9) _|_ ^,r(9 1)) _|_ a^r(9 -i)
+ 4. L_ 0,(9) .(9)
^,r(9-l)) qS II + jr(q~1}) + 021Si + + 0t+ 1,1
-4.s'i (2.41)
Ast Pij+i(r^ + 7r^q + 02,t+is\ + + 0t+i,t+isu
where a^s are the coefficients of the corresponding characteristic polynomials
(2.33) and 0^s are the elements of the matrix C in (2.34). The fact that the
operator A produces the correct Krylov residual subspace at the cycle q, i.e.,
span {Ar^-V,..., Amr( can be observed from the following equalities:
(?)
Aw^-1 = .. a (r(g) + 7r(<7_1)) + QqH-----------------------------------------h i
1+7
M
a0 (r(9) r(9 b -|- (1 + y)r^9 1J) + CTq'V

+ Q1 -I---------r
1+7
a{q)
aG
+ 7
Jm-1
= ^(r^ r^) + + +
1 + 7
where, by (2.41), Aw^_x Amr(-q~^ and, by (2.26), 0 ^ r(?) r(9-1) E
span{n;m^}-
41


The matrix [AJj of the operator *4, defined by (2.41), in the basis S is
identical to (2.36), with the only change of the subdiagonal element to
_Q()
-y^-, 7 A 1,0. Hence, A has the desired spectrum A. The representation A
of the operator A in the canonical basis is then determined by the similarity
transformation in (2.37), with the matrix S formed by vectors from S in (2.40)
written as columns.
Finally, to see that the residual vector Ag+1) is generally nonzero with the
new definition of the operator A, we notice from (2.41) that now
span {r^ + 7r^_1\si,... ,st}
is an invariant subspace of A and, hence,
r(Q) + 7r(?-0 G AJCt+i (A, r(g) + 7r(,?-1)) 7 A -1,0,
or,
r(9) G A/Ct+i (A, r<>) + JCt+2 (A, r^) , (2.42)
where t+1 < m by the assumption that q max {z 6 Z : z < njm}. Due to the
fact that r(9+1) = 0 if and only if r(q) G AJCm (A, r(?)), it suffices to show that the
component of the vector from lCt+2 (A, A9-1)) in representation (2.42) does
not generally belong to AJCm (A,r( that the term from /Ct+2 (A,r(9-1)) in (2.42) contains a nonzero component in
the direction r( is chosen from a specific subspace of C", i.e., expressing in terms of
and vectors Wm by (2.26),
42


r(0)^I, X = span {w$,..., x)} + AKm (A r{q)),
where dimX < q + m 1 < n/m + m 1 < n, provided that 0 < m < n.
2.2.6 Any admissible convergence behavior is possible for full and
restarted GMRES (at its q initial cycles)
As we pointed out at the beginning of the current section, the conver-
gence behavior of the full GMRES in Theorem 2.2 is restricted to the class
of convergence sequences which allow convergence to the exact solution only at
step n, i.e., /(0) > /(1) > > fin 1) > 0 if in) = 0). Similarly, the
cycle-convergence behavior of restarted GMRES in Theorem 2.11 is restricted
to cycle-convergence sequences which exclude the possibility of convergence to
the exact solution within the initial q cycles, i.e., /(0) > /(1) > > f(s) > 0
and f{s) = f(s + 1) = ... = /( Theorem 2.2 and Theorem 2.11 are sufficient for the theorems also to hold if
/(0) > /(1) >> f{n- 1) > 0 and /(0) > /(1) > > f(s) > 0,
/(s) = fis + 1) = ... = fiq), respectively.
Given an integer n > 0, assume that we want to construct an n-by-n matrix
A with a prescribed spectrum A and an initial residual vector r^ (or, equiva-
lently, a right-hand side b, since = b after setting x^ to 0) such that the
full GMRES applied to the corresponding system (2.1) results in the following
convergence pattern: /(0) > /(1) > > /(s 1) > /(s) = /(s + 1) = =
fin 1) = 0, s < n, ||rfc|| = /(A:). The construction is straight-forward. We
first split the set A into two disjoint subsets, say, A = As U A_s, where As
contains s elements from A while the remaining n s elements are included into
43


A_s. Next, by Theorem 2.2 we construct a matrix Aa G Csxs and a right-hand
side bs G Cs = 06 Cs), such that the full GMRES applied to the system
Aax = ba produces the convergence sequence /(0) > /(1) > > f(s 1) > 0
(/(s) = 0), moreover the spectrum of Aa is As. Finally, we define the resulting
matrix A G Cnx" and the right-hand side vector t G C" (x^ = 0 G Cn) as
follows:
f As 0 ^ fh\ bs
A = , b =
^ 0 An_S y W
where An_s G ch1-8)*!"-8) is an arbitrary matrix with a spectrum An_s. It is
easy to see that the full GMRES applied to the system of equations defined by
(2.43) produces the desired sequence of residual norms /(0) > /(1) > >
f(s 1) > f(s) = f(s +1) = = fin 1) = 0, II r<*> || = /(fc), r = b. Clearly
the matrix A in (2.43) has the prescribed spectrum A = As U An_s.
For the restarted GMRES the construction of a matrix A with the spectrum
A and a right-hand side b (x^ = 0) that provide the cycle-convergence sequence
/(0) > /(1) >> f(s 1) > f(s) = f(s + l) = ... = f(q) = 0 is analogous,
s < q, ||r^|| = f(k). Following Theorem 2.11, one constructs a matrix As G
Cmsxms and a right-hand side vector bs G Cms, such that the GMRES(m) applied
to the corresponding linear system produces the cycle-convergence curve /(0) >
/(1) > > f(s 1) > f(s) = 0. The spectrum of As is chosen to coincide with
a subset of ms elements of A. The construction of the matrix A G Cnxn and the
right-hand side be Cn is then accomplished by introducing an (nms) x (nms)
diagonal block with eigenvalues from A, which are not in the spectrum of As,
and expanding the vector b with (n ms) zeros, similarly to (2.43).
44


2.2.7 Restarted GMRES with variable restart parameter
The result of Theorem 2.11 can be generalized to the case where the restart
parameter m is not fixed, but varies over successive cycles according to an a
priori prescribed parameter sequence The proof, basically, repeats
the one in Subsection 2.2.2 with the difference that the constructed operator
A in the corresponding basis has block lower triangular structure with varying
diagonal block sizes mk, rather than the constant size mk = m as in (2.36).
Corollary 2.14 Let us be given an integer n > 0, a sequence {mk}qk=zl, 0 <
n?.fc < n, and a positive sequence {f{k)}qk_0, such that /(0) > /(1) > >
f(s) > 0 and f(s) = f(s + 1) = ... = f(q), where q is defined by the condition
||r^|| /(0) such that ||r^|| = f(k), k = 1 ,...,q, where r^ is the residual
at cycle k of restarted GMRES with a restart parameter varying according to
the sequence {mk}k=1 applied to the linear system Ax = b, with initial residual
r() = b Ax^0K Moreover, the matrix A can be chosen to have any desired
(nonzero) eigenvalues.
2.3 Conclusions
In this chapter we have established several results which address the cycle-
convergence behavior of the restarted GMRES. First, we have proved that the
cycle-convergence of the method applied to a system of linear equations with a
normal coefficient matrix is sublinear, and at best linear. Second, in the general
case, we have shown that any admissible cycle-convergence behavior is possible
There exists an n-by-n matrix A and a vector r^ with
45


for the restarted GMRES at q initial cycles, regardless of the eigenvalue distri-
bution of the coefficient matrix. This leads to the conclusion that no estimates,
which rely solely on the matrix spectrum, can be derived to characterize the
cycle-convergence of restarted GMRES at the first q cycles if the method is
applied to a linear system with a general nonsingular non-Hermitian matrix.
Though in practice q tends to be reasonably large (q < n/m), it remains an
open question if the above mentioned estimates hold at cycles which follow the
n/m-th GMRES(m) cycle.
46


3. Solution of symmetric indefinite systems with symmetric positive
definite preconditioners
We consider a system of linear equations
Ax = b, A A* e Rnxn, be Mn, (3.1)
where the coefficient matrix A is nonsingular and symmetric indefinite, i.e., the
spectrum of A contains both positive and negative eigenvalues.
Linear systems with large, possibly sparse, symmetric indefinite coefficient
matrices arise in a variety of applications. For example, in the form of saddle
point problems (see [10] and references therein), such systems may result from
mixed finite element discretizations of underlying differential equations of fluid
and solid mechanics. In acoustics, large sparse symmetric indefinite systems may
be obtained after discretizing the Helmholtz equation [69] for certain media types
and boundary conditions. Sometimes the need of solving the indefinite problem
(3.1) comes as an auxiliary task within other computational routines, e.g., inner
Newton step in the interior point methods in linear and nonlinear optimization,
see [53], or solution of the correction equation in the Jacobi-Davidson method
[64] for a symmetric eigenvalue problem.
Because of the large problem size, direct methods for solving linear sys-
tems may become infeasible, which motivates the use of iterative techniques for
finding satisfactory approximations to the exact solutions. There is a number
of iterative methods developed specifically to solve symmetric indefinite sys-
tems, ranging from modifications of the Richardsons iteration, e.g., [51, 58, 16],
47


*"
to optimal Krylov subspace methods, see [33, 59]. It is known, however, that
in practical problems the coefficient matrix A in (3.1) may be extremely ill-
conditioned which, along with the location of the spectrum of A to both sides
of the origin, can make the straightforward application of the existing schemes
inefficient due to a slow convergence rate. In order to improve the convergence,
one can introduce a matrix T RTlxri and consider the preconditioned system
TAx = Tb. (3.2)
If T is not symmetric positive definite (SPD), the matrix TA of the pre-
conditioned system (3.2), in general, is not symmetric with respect to any inner
product, implying that the specialized methods for solving symmetric indefinite
systems are no longer applicable and need to be replaced by methods for solving
nonsymmetric systems, e.g., one of the Krylov subspace methods: GMRES or
GMRES(m), BiCG, BiCGstab, QMR, etc (see, e.g., [33, 59]). Though known
to be effective for a number of applications, this approach can have several dis-
advantages. First, in order to maintain the optimality of a Krylov subspace
method, one has to allow the increase of the computational work at every new
iteration, which can become prohibitive for large problems. Second, the conver-
gence behavior of methods for solving nonsymmetric systems may not rely on
possibly accessible (estimated) quantities, such as, e.g., the spectrum of the coef-
ficient matrix (see the corresponding results for GMRES [34, 76]), which makes
it difficult or even impossible to estimate the computational costs a priori.
If T is chosen to be SPD (T = T* > 0) then the matrix TA of precon-
ditioned system (3.2) remains symmetric indefinite, however, with respect to
the T_1-inner product, i.e., the inner product defined by (x,y)T-i (x,T~1y)
48


for any x,y G Mn, where (,) denotes the Euclidean inner product, in which
the matrix A is symmetric. In particular, due to this symmetry preservation
(though in a different geometry), system (3.2) can be solved using an optimal
short-term recurrent Krylov subspace method, e.g., preconditioned MINRES
[55], or PMINRES, with the convergence behavior fully described in terms of
the (estimates of) spectrum of the preconditioned matrix TA. Therefore, in the
light of the discussion above, the choice of a properly defined SPD precondi-
tioner for solving a symmetric indefinite system can be regarded as natural and
favorable.
The goal of this chapter is twofold. First, we describe a hierarchy of meth-
ods, from a stationary iteration to an optimal Krylov subspace minimal residual
method, which allow solving symmetric indefinite linear system (3.1) with an
SPD preconditioner T. Second, we suggest a new strategy for constructing
SPD preconditioners for general symmetric indefinite systems being solved with
the described methods. Although the approaches, underlying the methods are
mostly known, e.g., the minimization of an appropriate norm of the residual
vector over a subspace, several of our observations seem to be new and are,
primarily, of a theoretical interest. In particular, we determine the smallest pos-
sible Krylov subspace, which can be used to properly restart the preconditioned
minimal residual method, applied to a symmetric indefinite linear system. For
example, this leads to a scheme which is a natural analogue of the preconditioned
steepest descent iteration for solving SPD systems. Independently of a particu-
lar implementation scheme, we state and prove simple convergence bounds, and
gain an insight into the structure of the local subspaces which determine a new
49


approximation at each step of the selected method. The results of this chap-
ter will motivate the construction of trial subspaces and SPD preconditioners
for symmetric eigenvalue (with a targeted interior eigenpair) and singular value
problems in Chapter 4 and Chapter 5.
In Section 3.1, we present the simplest iterative scheme with stationary iter-
ation parameters for solving a symmetric indefinite system with an SPD precon-
ditioner. Other methods are obtained essentially by allowing to vary the parame-
ters at each step of this stationary iteration in a way that a preconditioner-based
norm of the residual vector is minimized. In Section 3.2, we present a notion of
the optimal SPD preconditioner for a symmetric indefinite system, and suggest
constructing preconditioners based on an approximation of the inverse of the
absolute value of the coefficient matrix (absolute value preconditioners). We
show on the example of a linear system with a discrete real Helmholtz operator
(shifted Laplacian) that such preconditioners can be constructed in practice.
Moreover, the use of the preconditioning techniques based, e.g., on multigrid
(MG), can make this construction efficient.
3.1 Iterative methods for symmetric indefinite systems with SPD
preconditioners
Given a (possibly nonsymmetric) matrix A £ R71*, an initial guess £
R", a (nonzero) iteration parameter a £ R, and a (possibly non-SPD) precon-
ditioner T £ Rnxn, the iteration of the form
x(<+1) = x(i) + aw{i), w(i) = Tr{i), r(i) = b Ax{i), i = 0,1,...; (3.3)
where xS^ £ R71 is an approximation to the exact solution of system (3.1) at
iteration i, is commonly referred to as a preconditioned stationary iteration,
50


or Richardsons method with a stationary iteration parameter, see, e.g., [3, 33,
59]. We note that stationary iteration (3.3) can be considered as a simplest
iterative method for solving linear systems, and, if properly preconditioned, can
be efficient and computationally inexpensive.
In general, the (asymptotic) convergence rate of method (3.3) is governed
by the spectral radius of the iteration matrix M = I aTA, where T and a
(sometimes skipped after replacing aT by T) need to be chosen to make the
spectral radius of M strictly less than 1, see, e.g., [3, 33]. If both A and T are
SPD, one can always find a sufficiently small value of the step-size parameter
a > 0 to ensure that iteration (3.3) monotonically and linearly reduces the A-
norm of error. Moreover, a can be set to the value which provides an optimal
convergence rate for the method with an optimal convergence factor popt =
/c(T/l)+i < where the condition number k(TA) is a ratio of the largest and the
smallest eigenvalues of the preconditioned matrix TA (for more details see, e.g.,
[3])-
If A is symmetric indefinite and T is SPD, stationary iteration (3.3), in
general, diverges for any choice of the parameter a.
Proposition 3.1 Stationary iteration (3.3) applied to linear system (3.1) with
a symmetric indefinite matrix A and an SPD preconditioner T diverges for any
a, unless the initial guess x^ is specifically chosen.
Proof: Let us consider the preconditioned residual, corresponding to iteration
(3-3):
rr(i+i) = (/ aTA)i+1 7V(0), i = 0,1,.... (3.4)
51


Since T is SPD and A is symmetric indefinite, the preconditioned matrix
TA is T_1-Symmetric and indefinite with eigenpairs (Aj,Vj), where the (real)
eigenvalues Aj are of both signs and the eigenvectors yj are T_1-orthonormal,
j = 1,..., n. Then the eigenpairs of the iteration matrix / aTA are (/ij, yj)
where /j,j 1 aXj.
Let Cj be the coordinates of the preconditioned initial residual vector Tr^
in the basis of the eigenvectors yj. Assume that the initial guess is chosen in
such a way that Tr^ has at least two nontrivial components in the directions
of eigenvectors corresponding to eigenvalues of TA of the opposite sign. Then
we can always fix an eigenvalue Xj, of the matrix TA such that sign(AjJ =
sign(a), for any (nonzero) a, and the vector Tr^ has a nontrivial component
in the direction of t/j, i.e., Cj ^ 0. Thus, since the corresponding eigenvalue
(J.j, = 1 a\j. of I aT A is strictly greater than 1, using identity (3.4) and the
Pythagorean theorem with respect to the TMnner product, we obtain:
||r(*+1)||2 = ||7Y(i+1)|fr-i = || (/ aTA)i+1Tr^\\2T^
= n (/- qt.4)'+i = UX^crf-,
j=o j=o
= it (4fl^)2 > ¥£1ct)2 ->
3=0
as i > oo. This proves the divergence of iteration (3.3) with A symmetric
indefinite and T SPD for any a, unless the initial guess x^ is such that the
vector Tro has its nontrivial components only in the direction of eigenvectors
corresponding to eigenvalues of the same sign.
Since iteration (3.3) is not applicable for solving symmetric indefinite sys-
tems with SPD preconditioners, we further question what a correct way is to
52


define a simple scheme with stationary iteration parameters, different from us-
ing method (3.3) for the normal equations, that can be applied in the described
framework.
3.1.1 Stationary iteration for solving symmetric indefinite systems
with SPD preconditioners
Given a symmetric indefinite matrix A and an SPD preconditioner T, we
consider the iteration of the form
rM =b Ax^\ wW = Tpl\ sW = TAw^\ Pw^,
(3.5)
X(H-!) = x(*) -f q/W, 2 = 0, 1, ... ,
where a (nonzero) and P are real numbers. Scheme (3.5) can be viewed as
preconditioned stationary iteration (3.3) with a search direction replaced
by a modified direction which is a linear combination of the preconditioned
residual and a vector = TAwh). We notice that method (3.5) is exactly
stationary iteration (3.3), applied to solve the (preconditioned, T_1-Symmetric)
system
(TA pi) TAx = (TA pi) Tb, (3.6)
or, equivalently, the symmetric system (AT PI) Ax (AT pi) b with the
preconditioner T. Alternatively, (3.6) can be viewed as an instance of a polyno-
mially preconditioned system, see, e.g., [59].
With P = 0, method (3.5) turns into iteration (3.3) applied to the sys-
tem of the normal equations (TA)2x TATb, or, equivalently, to the system
AT Ax = ATb, with an SPD matrix AT A, preconditioned with T. The optimal
choice of the parameter a in this case leads to the (optimal) convergence rate
53


with a factor p^ We next show that for certain choices of the pa-
rameters a and (3 method (3.5) converges to the solution of system (3.1). More-
over, the (optimal) convergence rate is improved, depending on the eigenvalue
distribution of the preconditioned matrix TA, compared to the above discussed
approach based on solving the corresponding system of normal equations with
method (3.3).
Let us assume that the spectrum of the preconditioned matrix TA, i.e.,
A(TA) = {Al5..., Ap, Ap+i,..., An}, is located within the union of the two in-
tervals
l=[a,b}\J[c,d], (3.7)
where a The following theorem holds:
Theorem 3.2 Let us consider method (3.5) applied to solve linear system (3.1)
with a symmetric indefinite coefficient matrix A and an SPD preconditioner T.
Let us assume that the spectrum of the matrix TA is only known to be enclosed
within the pair of intervals X in (3.7).
If b < (3 < c and 0 < a A e{a.d}
llrh+1)||
.. ||r||r K Ae{a,6,c,d} n
i.e., method (3.5) converges to the solution of system (3.1). Moreover, the con-
vergence with the optimal convergence factor
'd\ f\b\ + d cN
K + 1
|6| / if \a\-\b\ l)(CM^).d-c
(3.9)
54


corresponds to the choice of parameters (3 = (3opt = c \h\ and a = a.opt, where
opt
2/(|6j c + d(\b\ + d c)), if |a| \b\ < d c
2/(\b\ c + jaj (c + |a| \b\)), if |a| \b\ > d c
(3.10)
Proof: As has been mentioned, method (3.5) is exactly stationary itera-
tion (3.3), applied to solve system (3.6), or, equivalently, the symmetric system
(AT (31) Ax = (AT (31) b with the preconditioner T. Thus, in order for
method (3.5) to converge, by Proposition 3.1, the parameter (3 needs to be cho-
sen such that the matrix Sp = (TA (3I)TA in (3.6) is positive definite, i.e., all
the eigenvalues pj of Sp are positive. Since pj = A2 (3\j, where Aj A(TA),
we conclude, by enforcing the parabola p(X) A2 (3\ > 0 on X (and hence on
A (TA) C I), that pj > 0 if b < (3 < c for all j = 1,..., n.
Next we observe that the preconditioned residual, corresponding to a step
of method (3.5), can be written as
Thus, using the derivations similar to those in Proposition 3.1, one gets the
following inequality for the T-norms of the residual vectors at the consecutive
iterations:
Since p(A) = A2 /?A > 0 for A 6 X, provided that b < (3 < c, it is possible to
choose a sufficiently small a in (3.11) such that |1 a(A2 (3A)| < 1 on I; i.e.,
Tr(t+i) = (/ aTA (TA + £/)) Tr(i) = (I- aSp) Tr(i).
t < max
AjgA(TA)
1 a(A2 (3Xj) \ IIt^Ht < max |l ct(A2 /3A)| Hr^Hx.
(3.11)
55


Therefore, the choice b < 8 < c and 0 < a < rp, where t$ = max (A2 8\),
X
implies, by (3.11), that ||r(l+^||7/||r^+1^||r < p < 1, where
p max 11 a(A2 8X) I = max 11 a(A2 8X) I.
Xel 1 1 Ae{a,6,c, This proves convergence bound (3.8).
Finally we determine the choice of the parameters a a^t and 8 = 8^
such that method (3.5) converges to the solution of system (3.1) with an optimal
rate, i.e., with the convergence factor
p Popt = min max 11 a(A2 (3X) I = max 11 Q,p Ae{a,6,c,d} Ag{a,6,c,d}
We note that, for any b < 8 < c, the corresponding optimal value of a =
Qoptifi) is
aopt(ft) = 7T2-------TTT----------772----^77> (312)
nun (A 8X) + max (A pA)
A{6,c} Ae{a,d}
see, e.g., Axelsson [3, Theorem 5.6] for a detailed explanation. This choice of a
leads to the convergence rate with the factor
n max (A2 /3A)
P = PaptiP) = where k(0) = 72777> for any b < 8 < c.
k{8) + 1 mm (A pA)
Ae{6,c}
(3.13)
Since 8 is assumed to be arbitrary from the interval (b, c), the above equality al-
lows us to conclude that the optimal convergence rate of method (3.5) applied to
solve system (3.1) occurs if 8 is chosen to minimize k(8) in (3.13). The latter is,
in fact, equivalent to the observation that method (3.3) with an optimal choice
of the parameter a applied to the family of systems (3.6) with (preconditioned)
coefficient matrices {S^ = (TA + 8I)TA}, b < 8 < c> delivers the best conver-
56


gence rate for the matrix Sgopt corresponding to fj = (3^, which minimizes the
condition number of Sp.
Now let (3 = Popt = c |6|. Then, since b2 (3optb = c2 poptc \b\ c, we
have
fc(Popt)
d ufydi if M -\b\ 2 ^ C
Q ,tfpta, if M ~\b\> d-c
\b\c
d\ f\b\ + d-c\
------1 if |a| -\b\ -|(.)(£M^|)iif|a!_l6l>d-c
One can check that the above choice (3 = (3upt indeed minimizes k(6) in (3.13),
e.g., by adding an arbitrary perturbation e to Popt and showing that the function
crpt + e) = (e) is increasing for e > 0 and decreasing for e < 0. This proves
the optimal convergence rate of method (3.5) given by the factor popt in (3.9)
with k k(Popt), where /^t = c-\b\ and, by (3.12), = aoptWopt), he.,
______________2___________
aopt |6| c + max (A2 ^optA)
AG
which results in expression (3.10).
We note that if ja| \b\ = d c, i.e., both intervals in (3.7) are of the same
length, then the optimal convergence factor p = popt in (3.9) is determined by
h . Although the proof of Theorem 3.2 does not rely on this assumption, it
be
is clear that for the general case, where ]a| |6| ^ d c, the expression for k in
(3.9) can be derived after extending the smaller interval to match the length of
the larger one by shifting the corresponding endpoint a or d, and then applying
the result for the intervals of the equal length. We also note that if [a, b] and [c, d]
57


are located symmetrically with respect to the origin, i.e., |aj = d and |6| = c,
then pcpt = 0 and method (3.5) turns into stationary iteration (3.3) applied to
/ a\ 2
normal equations with the optimal convergence rate determined by k = yj
which is essentially a square of the condition number of the matrix TA.
Finally, we remark that the idea of transforming the original symmetric
indefinite system (3.1) into an SPD system (3.6) with a minimized condition
number, which underlies method (3.5) and Theorem 3.2, has previously appeared
in literature, though without a preconditioner, e.g., in [3] in the context of the
Chebyshev iteration.
We will use scheme (3.5) as a base for obtaining simple preconditioned
residual-minimizing methods to solve system (3.1). Theorem 3.2 will allow us
to provide the corresponding convergence estimates.
3.1.2 Simple residual-minimizing methods for solving symmetric
indefinite systems with SPD preconditioners
Let us consider the following iterative scheme for solving a symmetric indef-
inite system (3.1) with an SPD preconditioner T and a fixed parameter /3:
l = sw pw, s(0 = TAw, w = Tr, r = h Ax,
= x(i) + al, a = b < & < c = 0> 1. .
(3.14)
where b and c are the endpoints of the intervals 1 in (3.7). Unlike stationary
iteration (3.5), method (3.14) allows the parameters a to vary at each step
such that the next approximation aT+1) corresponds to the residual vector with
the smallest T-norm in the affine space r + span {Al}, i.e.,
a = argmin||r^ aAl\\T.
(3.15)
a£R
58


The following theorem shows that method (3.14) converges to the exact
solution of system (3.1) for any b < < c, moreover the choice of fi [3opt
c |6| guarantees that (3.14) converges not slower than stationary iteration (3.5)
with optimal parameters.
Theorem 3.3 Let us consider method (3.14) applied to solve linear system (3.1)
with a symmetric indefinite coefficient matrix A and an SPD preconditioner T.
We assume that the spectrum of the matrix TA is only known to be enclosed
within the pair of intervals X in (3.7).
If b < 0 < c, then
(i+l)
|t*(0 I
. _ . max (A2 fiX)
\t ^ ^ u Kl_ Ae{a,d}
< P < 1, where p = , k = -2------------------ttt.
r k + 1 mm (A oA)
A e{6,c}v
(3.16)
Moreover, if (3 = fiopt = c |6|, then k is defined by (3.9).
Proof: By (3.15) we have
r(i+1)||r = ||r(i) a^Al^Wr < ||r(i) aAl[i) ||T Vo G R.
Let us assume that f3 is fixed, such that b < (3 < c. Then, following the proof
of Theorem 3.2, the choice of a = Oiopt{(3) as in (3.12), by (3.11), leads to
expression (3.13) for the convergence factor p in (3.16). One can verify that if
(3 = fiopt = c |6| then k is defined by (3.9).
Given a constant /3, e.g., provided by information about the spectrum loca-
tion or computational experience, scheme (3.14) represents the simplest residual-
minimizing method with the minimization at a step i performed over the one-
dimensional subspace span {^U^}, moreover the resulting convergence behavior
59


is in general improved compared to the one of the corresponding methods based
on solving normal equations.
If no information for the choice of /? is available, one can allow it to vary at
each step. For example, let us consider the following iterative scheme:
l = s(i) 8w, 5(0 = TAw, w = TV, r = b Ax,
H (3.17)
xh+b = x + a l, i 0,1,...,
where the parameters a and {3 are chosen to guarantee the minimal-
ity of the T-norm of the next residual vector r^+1^ over the affine space
r + span {Aw, As}, i.e., a and f3 in (3.17) are such that
||r(<+1)||T= min \\r u\\T. (3.18)
uespail{ AwW ,AsW }
Optimality condition (3.18) is equivalent to the following orthogonality condi-
tions
(r(i+1Us(i))r = (r{i+l\Aw)T = 0,
which provide the expressions for the iteration parameters:
o{i) (s, As)(w, As) (TAs, As)(w, Aw)
(w, As)2 (w, Aw)(s, As)
(i) (w,Al) (w, As)2 (w, Aw)(s,As)
a = (AIM,TAW) = (TAs^KAs^is^KAw^) (<*>, ytsW)2'
(3.19)
We note that along with expressions (3.19) for the choice of the parameters,
method (3.17)-(3.18) can admit other implementations, e.g., based on the prop-
erly restarted Lanczos procedure. The following theorem provides a bound on
the convergence rate of scheme (3.17)-(3.18).
60


Theorem 3.4 We consider method (3.17)-(3.18) applied to solve the linear sys-
tem (3.1) with a symmetric indefinite coefficient matrix A and an SPD precon-
ditioner T. We assume that the spectrum of the matrix TA is only known to be
enclosed within the pair of intervals I in (3.7). Then at each step of the method
the T-norm of the residual vector is reduced at least by the factor p in (3.9), i.e.,
< iLiL (320)
||rW||T -s+r ( '
Proof: Since and (3^ are such that ||r^+1^|| has the smallest T-norm over
rW + span {Aw^\ 24sW}, we get
||r(i+1)||r = ||r(i) a(i)i4/W||r = ||r{i) a(i)As(i) + a{i)p{i)Aw{i)\\T
< ||r^ otA (s^ f3w^) ||t, Va, /3 G R.
The choice of /? = (3^ = c |6| and a = a^t in (3.10), by Theorem 3.2, results
in convergence factor (3.9) for the reduction of the T-norm of the residual vector
at each step of method (3.17)-(3.18).
We remark that method (3.17)(3.18), for solving symmetric indefinite linear
systems with SPD preconditioners, described by convergence estimate (3.9),
(3.20), can be viewed as an analogue of the preconditioned steepest descent
iteration for solving SPD systems. In the next section we discuss methods,
including the optimal minimal residual iterations, which allow us to improve
convergence factor (3.9).
3.1.3 The second-order and minimal residual methods for solving
indefinite systems with SPD preconditioners
The ideas underlying methods (3.5), (3.14)-(3.15) and (3.17)-(3.18) can be
further extended to improve convergence factor (3.9). In particular, applying
61


the so-called second-order stationary iteration, i.e., the iteration of form (3.3)
with the additional term in the direction of the difference p = x
of approximations from the current and previous steps, to transformed system
(3.6), results in the following scheme for solving system (3.1) with an SPD
preconditioner T:
l = s pw, s = TAw, w = Tr, r = b- Ax,
^(t+i) x(i) _|_ q-W/(*) -f- (^M \)p ,p = x = x(\ (3.21)
i = 0,1,...,
where parameters a = a, ft = f3, 'y = 7 R are constant throughout
iterations. If, similarly to Theorem 3.2 for method (3.5), the parameter ft is
set to Popt = c |6|, then it is possible to show that there exist optimal values
for a and 7 such that scheme (3.21), with the stationary iteration parameters,
converges to the solution of (3.1) with the asymptotically average convergence
factor
Pavg
yfk, 1
(3.22)
y/k + 1
where k is defined in (3.9). In particular, the latter can be shown, e.g., by
using the convergence bound in Axelsson [3, Theorem 5.9] for the second-order
stationary iteration applied to transformed system (3.6) with ft = popt. Thus, the
convergence of method (3.21) with the optimal choice of stationary parameters
is given by
jpf CAV (3.23)
where C is a positive constant, and pavg is defined in (3.22).
In the same way as stationary scheme (3.5) has been extended to method
(3.17)(3.18), the second-order method (3.21), with a a, ft = ft and
62


7(*) = 7i can be generalized to have variable iteration parameters, chosen at
each step to minimize the T-norm of the next residual vector rb+1) in the affine
space + span {Aw^\ Asb)) Ap^}, i.e.,
||r(i+1)||T = min ||r(*) u\\t- (3.24)
uespan{.4 /tsW./ipW}
It is immediately seen that one step of method (3.21), (3.24) results in the
reduction of the residual T-norm, which is not worse than that provided by
method (3.17)(3.18), hence, convergence bound (3.9), (3.20) is valid for itera-
tion (3.21), (3.24). We remark that the latter bound is likely to be pessimistic
for method (3.21), (3.24), and, in practice, according to estimate (3.23) for iter-
ation (3.21) with optimal stationary parameters, it is reasonable to expect the
reduction of the residual norm by a factor of order (3.22).
Methods (3.17)(3.18) and (3.21), (3.24) are examples of convergent locally
optimal preconditioned methods for solving a symmetric indefinite system (3.1)
with an SPD preconditioner, based on the idea of the residual norm minimiza-
tion. The local optimality follows from the corresponding conditions (3.18) and
(3.24), which, at each step, seek to minimize the residual T-norm over certain
low-dimensional, local, subspaces of a fixed size.
As opposed to locally optimal methods, the preconditioned globally optimal
residual-minimizing methods for solving system (3.1), at each step i, extract a
minimizer for the appropriate residual norm from an (expanding) ^-dimensional
subspace. We now define the (globally optimal) Krylov subspace preconditioned
minimal residual methods for solving system (3.1) with a preconditioner T.
Definition 3.5 We say that a method to solve system (3.1) is a preconditioned
minimal residual method, if, at step i, it constructs an approximation a^*) to the
63


solution of system (3.1) of the form
x(i) e x(0) + Ki (TA, Tr(0)), (3.25)
and the corresponding residual vector r^ = b Ax^ is suc/i that
||r^||s = min ||r^ u\\s, (3.26)
ueAKi(TA,TrW)
where ICi(TA,Tr()) = span {7V\ (TA)Tr^\ ..., (T54)l_12V)} is the (pre-
conditioned) Krylov subspace generated by the matrix TA and the vector Tr^\
A)Ci (TA,Tr^) span {(AT)r(\ ... ,(AT)V^} is the corresponding Krylov
residxial subspace; ||x||| = (x,Sx) for some SPD operator S.
In particular, for general (square) matrices T and A, the preconditioned
minimal residual method with S T*T in (3.26) is delivered, e.g., by the pre-
conditioned GMRES [61, 33, 59]. The case where A is symmetric indefinite and
T is SPD with S = T is commonly fulfilled with the preconditioned MINRES
algorithm (PMINRES) [55, 33, 59], which is known to admit a short-term recur-
rent form while maintaining global optimality (3.26) in exact arithmetic. Scheme
(3.17)(3.18) corresponds to the preconditioned minimal residual method with
S = T restarted after every two steps. Iteration (3.21) with variable parameters
chosen according to (3.24) can be viewed as the same preconditioned mini-
mal residual method restarted after every two steps with the additional vector
pW = x
Finally, let us note that convergence factor (3.22) is commonly used to
estimate the convergence rate of the preconditioned minimal residual method
(3.25)-(3.26) with S = T (e.g., in PMINRES implementation), once the residual
64


norms are measured at every other step, i.e.,
(3.27)
see, e.g., [33, 59].
In the next section, we define the optimal SPD preconditioner T for mini-
mal residual methods (3.25)-(3.26), as well as for the locally optimal methods
described in the current section, applied to solve system (3.1) with a symmetric
indefinite coefficient matrix A.
3.2 Absolute value preconditioners for symmetric indefinite systems
In this section, we propose a novel concept of absolute value preconditioning,
where the preconditioner approximates the absolute value of the coefficient ma-
trix. We show, for a model problem, that such a preconditioner can be efficiently
constructed in the multigrid framework.
3.2.1 Optimal SPD preconditioners for symmetric indefinite systems
Let A E Rnxn be a symmetric matrix with an eigendecomposition A =
VAV*, where V is an orthogonal matrix of eigenvectors and A = diag{Aj}, j =
1,..., n, is a diagonal matrix of eigenvalues of A. We consider the factorization
of the form
where |Aj = V |A| V* is an (SPD) absolute value of the matrix A (matrix absolute
value), |A] = diag{|Ay [}, and sign(A) = Vsign(A)!/* is a sign of A (matrix sign),
sign(A) = diag{sign(Aj)}. Factorization (3.28) is, in fact, a polar decomposition,
see, e.g., [42], of the symmetric matrix A, with the positive (semi) definite factor
|A| and the orthogonal factor sign (A).
A = |A| sign(A) = sign(A) |A|
(3.28)
65


The following theorem states that the inverse of the absolute value of the
coefficient matrix is an optimal SPD preconditioner for the methods described in
the previous section, including minimal residual methods (3.25)-(3.26), applied
to solve a (general) symmetric indefinite linear system, i.e., T = = |A!_1.
Theorem 3.6 Any minimal residual method (3.25)-(3.26), applied to solve lin-
ear system (3.1) with a symmetric indefinite coefficient matrix A and the precon-
ditioner T |A|-1, converges to the exact solution in at most two steps. Further,
under the same assumptions on A andT, methods (3.14)~(3.15), (3.17)-(3.18),
and (3.21) satisfying (3.24), as we^ as schemes (3.5) and (3.21) with corre-
sponding optimal stationary iteration parameters, deliver the exact solution in
exactly one step.
Proof: Minimization property (3.26) at a step i of a preconditioned minimal
residual method can be equivalently written as
||rw||s= min ||p(AT)r(q)||s, (3.29)
pevit p(o)=i
where Vi is a set of all polynomials of degree at most i. Then, according to the
decomposition (3.28), the choice T |A|_1 results in the matrix AT = sign(A)
with only two distinct eigenvalues: 1 and 1. Hence the minimal polynomial
of AT is of the second degree. Thus, by (3.29), ||r^||5 = 0 for at most i = 2
and any SPD operator S. Thus any preconditioned minimal residual method
(3.25)-(3.26) converges to the exact solution of the symmetric indefinite system
(3.1) with T = |A|-1 in at most two steps.
The one-step convergence of the remaining methods follows from the obser-
vation that factors (3.9) and (3.22) are zero, since k = 1 if T = |A| 1.
66


If A is SPD, the claim of the theorem reduces to the trivial fact that the
optimal preconditioner for system (3.1) is the exact inverse of A. We also note
that actual implementations of minimal residual methods (3.25)-(3.26), at their
two consecutive iterations, perform essentially the same, in terms of matrix-
vector multiplications, number of computations as one step of methods (3.14)-
(3.15), (3.17)(3.18), (3.21) with condition (3.24), as well as stationary iterations
(3.5) and (3.21) with the optimal choice of iteration parameters. In this sense,
the two-step optimality result for a minimal residual method, given by Theorem
3.6, is compatible with the optimal one-step convergence of the above mentioned
methods, described in this section. The latter also explains the compatibility of
convergence estimates (3.23) and (3.27).
Remark 3.7 Methods of form (3.17) and (3.21) with T = |A|-1 and the corre-
sponding optimality conditions (3.18) and (3.24) replaced by the residual mini-
mization in an arbitrary S-norm, i.e.,
||r(*+1)n^g, mjn IJrC1) u\\s, (3.30)
uespan^AuiW
and,
||r(l+1)||s = min ||r(l) u\\s, (3.31)
ue span{ Aw(*) AsW, ApW }
respectively, also converge to the exact solution of symmetric indefinite system.
(3.1) in exactly one step for any SPD operator S.
In practical situations the construction of the optimal preconditioner =
|.4|_1 becomes prohibitive. We show, however, that the choice of the precondi-
tioner T as some approximation of T^t, i.e., T !A|-1, may lead to a significant
67


improvement in the convergence rate of an iterative method. For example, the
preconditioners T |A|_1 can be constructed by exactly inverting the abso-
lute value of a symmetric approximation of the coefficient matrix A, assuming
that the latter can be efficiently performed. In particular, if A is diagonally
dominant, then T can be chosen to be diagonal,
T = diag {| |1} ,
where a.jj are the diagonal entries of A. Let us agree to call an SPD precon-
ditioner T, such that T |A|_1, an absolute value preconditioner for a linear
system (3.1).
Due to a large problem size, we further assume that an absolute value pre-
conditioner T can be accessed only indirectly, e.g., through a matrix-vector
multiplication. In this case, given a vector r G Rn, there are several ways to
approach the construction of T by defining a vector w = Tr. As the first option,
at each step i of any method described in the previous section, one can attempt
to apply the absolute value preconditioner by approximately solving for 2 the
following equation
|A| z = r, (3.32)
where, e.g., r = or/and r = ATr^l\ depending on the selected method.
The coefficient matrix |,4| is generally not available. The problem of approx-
imately solving linear system (3.32) can be formally replaced by the problem
of finding a vector w which approximates the action of the matrix function
f(A) = |A|-1 on the vector r, i.e., w f(A)r |A|-1 r, moreover the construc-
tion of w does not require any knowledge of |A| or |A|-1. The latter constitutes
68


a well established task in matrix function computations which is standardly ful-
filled by a Krylov subspace method, e.g., [39, 32]. Our numerical experience
shows that though the convergence rate of a linear solver can be significantly
improved with this approach, the computational costs of the existing methods
for approximating f(A)r= |A|_1 r, e.g., the Lanczos method described in [12],
however, remain too high for their direct use in the context of absolute value
preconditioners for solving symmetric indefinite linear systems.
Another option to apply an absolute value preconditioner is to use a method,
based on a certain preconditioning technique, which is possibly divergent as a
stand-alone approximate solver for equation (3.32), e.g., since only limited infor-
mation about the coefficient matrix |A| is available, however, which (implicitly)
results in the construction of an approximation to |A|_1 of a reasonably good
quality. Below we demonstrate on the example of a model problem that such
construction of an efficient absolute value preconditioner is indeed possible, e.g.,
if based on MG techniques.
3.2.2 An absolute value preconditioner for a model problem
Let us consider the following boundary value problem,
- Au(x, y) c2u(x, y) = /(x, y), (x, y) G = (0,1) x (0,1),
u\r = 0,
(3.33)
where A = is the negative Laplace operator (or, Laplacian),
&x2 oy2
c G R, /(x, y) G C(fi), and T denotes the boundary of the domain 0. Problem
(3.33) is, in fact, a particular instance of the Helmholtz equation with Dirichlet
boundary conditions, c2 is a wavenumber; see, e.g., [69].
69


After introducing a uniform grid of the step size h (mesh size) and using the
standard 5-point finite-difference (FD) stencil to discretize continuous problem
(3.33), see, e.g., [30], one obtains the corresponding discrete problem, i.e., system
of linear equations (3.1) of the form
(L c2I)x = b, (3.34)
where the coefficient matrix A = L c2I (the discrete Helmholtz operator) rep-
resents a discrete negative Laplace operator L, satisfying the Dirichlet boundary
condition at the grid points on the boundary, shifted by a scalar c2 times the
identity matrix /. The right-hand side b in (3.34) corresponds to the vector of
function values of /(x, y) calculated at the grid points (numbered in the lexi-
cographical order). In our numerical tests, b is generated randomly. The exact
solution x = x* of system (3.34) then provides an approximation to the solution
of the boundary value problem (3.33) evaluated at the grid points.
Further, assuming that c2 is different from any eigenvalue of the SPD neg-
ative Laplacian L and is greater than its smallest, however less than its largest,
eigenvalue, i.e., Xmin(L) < c2 < Xmax(L), where Amin(L) = 2ir2 + 0(h2) and
Amax(A) = 8h~2 + 0(1), we conclude that the operator A = L c2/ is non-
singular symmetric indefinite. Thus, in order to solve system (3.34), accord-
ing to Theorem 3.6, one can choose any of the methods from the previous
section with an (absolute value) preconditioner T approximating the operator
|A|-1 = |L c2I\~\ Below we use the MG techniques to provide an examples
of such preconditioner. We refer to (3.34) as the model problem.
70


3.2.2.1 Multigrid absolute value preconditioner
In this section we use the ideas underlying the (geometric) MG methods,
e.g., [73, 14], to construct a preconditioner for the model symmetric indefinite
system (3.34). Combining the MG principles with the idea of the absolute
value preconditioners (Theorem 3.6), we construct an efficient preconditioner
for the model problem with low wavenumbers c2, i.e., if the operator A = L
c2I in (3.34) is slightly indefinite. We compare the proposed approach with a
preconditioning strategy based on the inverse of the Laplacian, which we set as
a benchmark to assess the quality of the constructed preconditioner.
Along with the (fine) grid of the mesh size h underlying the discretized
Helmholtz equation (3.34) let us consider a (coarse) grid of a mesh size H > h.
We denote the discretization of the negative Laplacian on this grid by Lh, Ih
represents the identity operator of the corresponding dimension. Further, we
assume that the fine-level absolute value | L c2/| is not computable, while its
coarse-level analogue \ LH c2IH\ can be efficiently constructed and/or inverted,
e.g., by the full eigendecomposition. Let us note that in the two-grid framework
we use the subscript H to refer to the quantities defined on the coarse grid. No
subscript is used for denoting the fine-grid quantities.
We suggest the following scheme as an example of the two-grid absolute
value preconditioner for model problem (3.34).
71


Algorithm 3.8 (Two-grid absolute value preconditioner)
Input r, output w.
1. Pre-smoothing. Apply v pre-smoothing steps with the zero initial guess
(w^ = Q):
wd+i) = w(i) + M-i(r Lw0)^ i = 0)..., v l, (3.35)
where the (nonsingular) matrix M defines the choice of a smoother. This
step results in the pre-smoothed vector wwe = w^v\ u > 1.
2. Coarse grid correction. Restrict the vector r Lwpre to the coarse grid,
multiply it by the inverted coarse-level absolute value \Lu c?Ih\, and then
prolongate the result back to the fine grid. This delivers the coarse-grid
correction, which is added to wpre to obtain the corrected vector wcgc:
wH = \LH-c2IH\~l R(r-Lwpre), (3.36)
wC9C = wpre + PwH, (3.37)
where P and R are prolongation and restriction operators, respectively.
3. Post-smoothing. Apply u post-smoothing steps with the initial guess w^ =
wcgc:
W{i+D = wd) + M-*(r Lwd)^ i = o,...,iy-l. (3.38)
This step results in the post-smoothed vector wpost = w^u\ Return w =
^jjpOSt
In (3.36) we have assumed that the coarse-grid operator \Lh c2Ih\ is invertible,
i.e., c2 is different from any eigenvalue of Ljj. The number of smoothing steps
72


T5
in (3.35) and (3.38) is the same; the pre-smoother is defined by the nonsingular
matrix M, while the post-smoother is delivered by M*.
We note that once the absolute value of the discrete Helmholtz operator
Lc2I on the fine level is not available, one can attempt to replace it by an easily
accessible SPD approximation, e.g., the negative Laplacian L = \L\ ss \L c2/|,
as was done in Algorithm 3.8. Although this substitution may result in the
divergence of the algorithm as a two-grid method for solving equation (3.32),
we show that its use as a preconditioner (along with its multigrid extension)
allows us to noticeably accelerate the convergence of the methods described in
the previous section applied to solve the symmetric indefinite model problem
(3.34) for shifts c2 of a relatively small size.
One can check that the two-grid Algorithm 3.8 implicitly constructs a map-
ping r i- w = Ttgr, where the operator T = Tt9 has the following structure:
Ttg= (I M-*L)U P\Lh c2/i/|_1 R(I LM~lY + S, (3.39)
with S L~x (I M~*L)U L~l (I LM~1)l/. In particular, in the context
of methods from the previous section, at each iteration i, the vector r is set
to rW or/and ATr^\ where = b (L c2I)x^ is the residual vector of
problem (3.34) at the i-th step of the corresponding method. The fact that the
constructed preconditioner T = Ttg is SPD, follows directly from the observation
that the first term in (3.39) is symmetric positive semi-definite provided that
P = aR* for some nonzero scalar a, while the second term S is symmetric and
positive definite if the spectral radii p(I M~lL) < 1 and p(I M~*L) < 1.
The latter condition, in fact, requires the pre- and post-smoothing iterations
(steps 1 and 3 of Algorithm 3.8) to represent convergent methods for system
73


(3.34) with c = 0 and b = r (i.e., for the discrete Poissons equation) on their
own. We note that the above argument for the operator T = Ttg to be SPD
essentially repeats the corresponding pattern to justify symmetry and positive
definiteness of a two-grid preconditioner applied within an iterative scheme, e.g.,
preconditioned conjugate gradient method (PCG), to solve a system of linear
equations with an SPD coefficient matrix; see, e.g., [13, 67].
Now let us consider a hierarchy of m + 1 grids numbered by l m, m
1,..., 0 with the corresponding mesh sizes {hi} in the decreasing order (hm = h
corresponds to the finest, and ho to the coarsest, grid). For each level l we define
the discretization Li c2Ii of the differential operator in (3.33), where L/ is the
discrete negative Laplacian on grid l, and Ii is the identity of the same size.
In order to extend the two-grid absolute value preconditioner given by Algo-
rithm 3.8 to the multigrid, instead of inverting the absolute value |Lh (?Ih\ in
step 2 (formula (3.36)), we recursively apply the algorithm to the restricted
vector R(r Lwpre). This pattern is then followed, in the V-cycle fash-
ion, on all levels, with the exact inversion of the absolute value of the dis-
crete Helmholtz operator on the coarsest grid. The described approach can
be viewed as replacing wH in (3.36) by its approximation, i.e., constructing
wh ~ \Lh c2/h|_1 R(r Ltvpre).
If started from the finest grid l m, the following scheme gives the multilevel
extension of the two-grid absolute value preconditioner defined by Algorithm 3.8.
We note that the subscript l is introduced to match the occurring quantities to
the corresponding grid.
74


Algorithm 3.9 (AVP-MG(r/): MG absolute value preconditioner)
Input ri, output wi.
1. Pre-smoothing. Apply v pre-smoothing steps with the zero initial guess
(wf] = 0):
wf+1) = w[l) + Mfl(ri LivJf), i = 0,..., v 1, (3.40)
where the (nonsingular) matrix Mi defines the choice of a smoother on
level l. This step results in the pre-smoothed vector wfre = w^, v > 1.
2. Coarse grid correction. Restrict the vector ri Liwfre to the grid l 1.
If l = 1, then multiply the restricted vector by the inverted coarse-level
absolute value |L0 c2/01,
w0 = |L0 c2/0r Ro (n Lxwfe), if 1 = 1. (3.41)
Otherwise, recursively apply AVP-MG to approximate the action of the
inverted absolute value |L/_i c2/;_i| on the restricted vector,
wi-i = AVP-MG(Rt-i {n hwD), ifl> 1- (3-42)
Prolongate the result back to the fine grid. This delivers the coarse-grid
correction, which is added to w(re to obtain the corrected vector w(9C:
uf9C = wfre + Ptwi-1, (3.43)
wher'e is given by (3.41)-(3.42). The operators and Pi define
the restriction from the level l to l 1 and the prolongation from the level
l l to l, respectively.
75


3. Post-smoothing. Apply u post-smoothing steps with the initial guess w\^ =
+ Mf*(ri Liwj^), i = 0,..., u 1. (3.44)
This step results in the post-smoothed vector wfost = Return wi =
The described multigrid absolute value preconditioner implicitly constructs
a mapping r w = Tmgr, where the operator T = Tmg has the following
structure:
Tmg =(I- M-L)v PT^-VR (I LM~1)'/ + S, (3.45)
with S as in (3.39) and defined according to the recursion below,
22 = (/| ,-£<) PiTH^Ri-i (h L,Mpy + S,, l = 1,... ,m 1.
= ji c2/|-', (3.46)
where S, = Lp Vl ~ KLtY Lp (h ~ UMp)".
Let us note that in (3.45) we skip the subscript in the notation for the
quantities associated wuth the finest level l = m. The structure of the multilevel
preconditioner T = Tmg in (3.45) is similar to that of the two-grid precondi-
tioner T = Ttg in (3.39), with \Lh c2Ih\~1 replaced by the recursively defined
operator in (3.46). If the assumptions on the fine-grid operators M, M*,
R and P, sufficient to ensure that the two-grid preconditioner in (3.39) is SPD,
remain valid throughout the coarser levels, i.e., Pi = aR^_l, p(Ii Li) < 1
and p(Ii M^*L{) <1, l 1,... ,m 1, then the symmetry and positive def-
initeness of the multigrid preconditioner T = Tmg in (3.45) is easily extended
from the same property of a two-grid operator through relations (3.46).
76


3.2.2.2 Numerical examples
As mentioned before, two-grid Algorithm 3.8, as well as its multilevel ex-
tension given by Algorithm 3.9, can be viewed as an attempt to solve equation
(3.32) using an MG method, where the absolute value of the discrete Helmholtz
operator on finer levels is replaced by its approximation, i.e., the discrete neg-
ative Laplacian. Alternatively, the described approach can be interpreted as
essentially applying the V-cycle of an MG method to solve the discrete Pois-
sons problem (i.e., approximating an inverse of the Laplacian), however, with
the modified coarse grid solve.
In fact, the use of the inverse of the (shifted) Laplacian as a preconditioner
for the Helmholtz equation (with possibly complex c2), initially introduced in
Turkel et al. [7], is well known and remains an object of active research, e.g.,
[50, 27, 75]. In our numerical tests below we consider the inverted Laplacian
preconditioner as a benchmark to assess the quality of the MG absolute value
preconditioner delivered by Algorithm 3.9.
Figure 3.1 illustrates several runs of PMINRES with MG absolute value pre-
conditioners applied to solve model problem (3.34), which are compared to the
corresponding runs of MINRES preconditioned with an exactly inverted (using
matlab backslash operator) negative Laplacian. The shifts (wavenumbers)
c2 are chosen to maintain a relatively small number of negative eigenvalues of
the Helmholtz operator discretized on the grid of the mesh size h == 2-7, i.e.,
c2 = 100, 200, 300 and 400. The right-hand side vectors b as well as initial
guesses x0 are randomly chosen (same for each shift value); the tolerance for the
2-norm of the residuals (relatively to the 2-norm of right-hand side b) is 10-7.
77


Shift value c2 = 100 (6 negative eigenvalues)
Shift value c2 = 200 (13 negative eigenvalues)
Figure 3.1: Comparison of the MG absolute value and the inverted Laplacian
preconditioners for PMINRES applied to the model problem of the size n =
(27- l)2 ss 1.6 x 104.
The MG components for the absolute value preconditioners are defined in the
following way: cj-damped Jacobi iteration as a (pre- and post-) smoother with
the damping parameter u = 4/5, standard coarsening scheme (i.e., = 2hi)
with the coarsest grid of the mesh size 2~4 (coarse problem size no = 225),
full weighting for the restriction, and piecewise multilinear interpolation for the
prolongation, see, e.g., Trottenberg et al. [73] for more details. The number
of the smoothing steps v is chosen to be 1 and 2 (these runs are titled AVP-
MG-JAC(l) and AVP-MG-JAC(2), respectively, on Figure 3.1; Laplace
78


corresponds to the case where the inverted Laplacian is used as a preconditioner).
We note that the increase in the number of smoothing steps improves the quality
of the MG preconditioner and results in the faster (in terms of iterations number)
convergence of PMINRES. PMINRES with the absolute value preconditioners
is also observed to be more robust with respect to the increase of the shift value
compared to the case with the inverted Laplacian.
h = 2~7 h = 2~8 h = 2~9 h = 2~10
c2 = 100 15 14 14 14
c2 = 200 21 21 21 21
O O CO II 31 32 32 30
c2 = 400 40 39 40 40
Table 3.1: Mesh-independent convergence of PMINRES with the MG absolute
value preconditioner
Table 3.1 shows the mesh-independence of the convergence of PMINRES
with the MG absolute value preconditioner (one pre- and post-smoothing step)
given by Algorithm 3.9. The rows of the table correspond to the shift values
c2 and the columns to the mesh size h. The cell in the intersection contains
the number of steps performed to achieve the decrease by the factor 10-8 in the
error norm. The mesh size of the coarse grid was kept the same throughout all
runs, i.e., h0 = 2~4 (n0 = 225).
It can be observed from Table 3.1 that the quality of the MG absolute value
preconditioner deteriorates with the increase of the shift value. Figure 3.2, which
79


Performance of the MG absolute value preconditioned
Figure 3.2: Performance of the MG absolute value preconditioners for the
model problem with different shift values. The problem size n = (27 l)2
1.6 x 104. The number of negative eigenvalues varies from 0 to 75.
shows the number of PMINRES iterations performed to decrease the norm of the
initial error by 10-8 for a given value of c2, reflects the speed of this deterioration.
The number of pre- and post-smoothing steps is set to one. We note that for
higher wavenumbers it may be desirable to have a finer grid on the coarsest level
in Algorithm 3.9.
Figure 3.3 compares locally optimal preconditioned methods (3.17)(3.18),
denoted by AVP-MG-LOl, and (3.21), (3.24), denoted by AVP-MG-L02,
with (globally optimal) preconditioned (AVP-MG-MINRES) and unprecon-
ditioned MINRES. The MG absolute value preconditioner, defined according
to Algorithm 3.9, is set up as in the previous tests, with one pre- and post-
smoothing step. We count the multiplication of a vector by the preconditioned
matrix TA as one matrix-vector product. In the unpreconditioned case T = I.
We also assume that methods (3.17)-(3.18) and (3.21), (3.24) are implemented
80


to perform twice as many matrix-vector multiplications per step as the PMIN-
RES algorithm.
Comparison of preconditioned iterations, c2 = 100 Comparison of preconditioned iterations, c2 = 200
Figure 3.3: Comparison of PMINRES with locally optimal methods (3.17),
(3.19) and (3.21), (3.24), all with the MG absolute value preconditioners, applied
to the model problem of the size n = (27 l)2 1.6 x 104.
As expected, the preconditioned globally optimal method exhibits better
convergence rate than methods (3.17), (3.19) and (3.21), (3.24). Method (3.21),
(3.24) is noticeably faster than (3.17), (3.19), which demonstrates that the in-
troduction of the vector p^ in (3.21) indeed improves the convergence. At a
number of initial steps iteration (3.21), (3.24) is comparable with PMINRES,
however, the latter significantly accelerates at a certain step (possibly with the
occurrence of the superlinear convergence), while the former continues to con-
verge essentially at the same rate.
3.3 Conclusions
In this chapter we have introduced a new preconditioning strategy for sym-
metric indefinite linear systems, which is based on the idea of approximating the
inverse of the absolute value of the coefficient matrix. We call SPD precondi-
tioners constructed according to this principle the absolute value preconditioners.
81


We have been able to show that, for the model problem of a linear system with
a two-dimensional shifted discrete negative Laplace operator as a coefficient
matrix, the construction of an absolute value preconditioner can be efficiently
performed using the (geometric) MG techniques. The symmetry and positive
definiteness of the suggested preconditioners allow to use them within the op-
timal short-term recurrent Krylov subspace methods for symmetric indefinite
linear systems, e.g., PMINRES. In the next chapter, we show that the absolute
value preconditioners can also be used for computing the smallest magnitude
eigenvalues and the corresponding eigenvectors of symmetric operators.
The future direction of the related research, as we envision it at the moment,
includes the extension of known preconditioning techniques, e.g., the domain
decomposition, algebraic multigrid, etc., for constructing absolute value precon-
ditioners. Of our particular interest is the construction of algebraic absolute
value preconditioners, as opposed, e.g., to the geometric MG used to justify the
concept in the current chapter. The multilevel methods seem to us quite promis-
ing for constructing the efficient preconditioners, since they allow to perform all
the intense computations, e.g., the inversion of a matrix absolute value using
the full eigendecomposition, on a coarse space of a relatively low dimension. It
is also of our interest to relate the multilevel framework to relevant factoriza-
tions that can be used for preconditioning symmetric indefinite systems, e.g.,
Bunch-Parlett factorization, see [29, 15], performed on a coarse space.
The significant part of this chapter has been devoted to the locally optimal
preconditioned methods for solving symmetric indefinite linear systems. Unlike
the preconditioned minimal residual method, e.g., the PMINRES algorithm,
82


these methods lack the global optimality and, hence, demonstrate slower con-
vergence. As will be seen in the next two chapters, the study of the locally
optimal schemes is of crucial importance for extending the ideas underlying the
linear solvers to the eigenvalue and singular value computations. We also note
that the understanding of the behavior of the locally optimal methods is im-
portant for the completeness of theory of the residual-minimizing methods
for symmetric indefinite linear systems. In certain frameworks, e.g., if the pre-
conditioner is variable, these schemes can become the methods of choice. The
study of the convergence behavior of the locally optimal iterations with variable
preconditioning represents one of the directions of the future research.
83


4. Preconditioned computations of interior eigenpairs of symmetric
operators
In this chapter, we consider the generalized symmetric eigenvalue problem
(eigenproblem)
Av = XBv, A = ,4* e K"xn, B = B* > 0 G Mxn, (4.1)
where the targeted eigenpair corresponds to the smallest, in the absolute value,
eigenvalue of the matrix pencil A XB. It is well known, e.g., [56], that problem
(4.1) has all real eigenvalues A*, while the corresponding eigenvectors Vi, such
that Av-i XiBvi 0, can be chosen 5-orthogonal, i.e., (uj, Vj)b = (vi, Bvj) = 0,
i ^ j. If B = I, then the generalized problem (4.1) reduces to the standard
symmetric eigenproblem.
Problems of form (4.1) appear in a variety of applications, e.g., analysis of a
systems vibration modes, buckling, electronic structure calculations of materi-
als, graph partitioning, etc. The resulting operators A and B are often extremely
large, possibly sparse and ill-conditioned. It is usually required to find a small
fraction of eigenpairs, which, typically, correspond to neighboring eigenvalues of
the pencil A XB.
An important class of symmetric eigenproblems (4.1) seeks to find several
extreme, i.e., algebraically largest or smallest, eigenvalues and the correspond-
ing eigenvectors (extreme eigenpairs). If the problem size is large, there is a
number of well-established methods which can be employed to approximate the
84


extreme eigenpairs: the Lanczos method and its variations [56], the Jacobi-
Davidson method (JD) [64], the family of preconditioned conjugate gradient
(PCG) iterations, surveyed, e.g., in [54], etc. Though different in their formu-
lations, many of the methods, in fact, follow the same framework, i.e., they
perform the Rayleigh-Ritz procedure, see, e.g., [56], on certain low-dimensional
subspaces, further called the trial subspaces. The choice of the trial subspaces
essentially constitutes the main difference between such methods, also called
projection methods. For example, the Lanczos method performs the Rayleigh-
Ritz procedure on the Krylov subspaces, the JD relies on the subspaces obtained
by solving correction equations, locally optimal (block) PCG methods [48] use
spans of the current eigenvector approximations, the preconditioned residuals
and the conjugate directions. For the comprehensive review of the relevant
algorithms we refer the reader to [4].
Another important class of eigenproblems (4.1) aims at finding several eigen-
pairs corresponding to the eigenvalues in the interior of the spectrum of the
pencil A AB (interior eigenpairs). In particular, the important case is to
find a number of eigenpairs corresponding to the eigenvalues with the smallest
absolute values of a symmetric indefinite matrix. Large problems of this type
frequently appear in applications, e.g., in the electronic structure calculations,
see [68, 60], where a number of eigenpairs of a Hamiltonian matrix around a
given energy level need to be found. The standard approaches for finding the
interior eigenpairs are typically based on the shift-and-invert (SI), e.g., [4], or on
the folded spectrum (FS), e.g., [71] and the references therein, transformations,
and the subsequent application of one of the above mentioned methods, e.g.,
85


PCG, for finding extreme eigenpairs of the transformed problem. Both of the
approaches, however, have potential disadvantages. To apply SI, at each step of
a method, one needs to solve a large linear system involving the shifted matrix
A. The FS-based methods worsen the conditioning of the problem, possibly
increase the clustering in the targeted (transformed) eigenvalues, and are not
easily applicable to generalized eigenproblems, i.e., B f I.
*
In this chapter, we introduce a method, that we refer to as the Precondi-
tioned Locally Minimal Residual method (PLMR), which allows us to compute
an eigenpair, corresponding to the smallest, in the absolute value, eigenvalue of
problem (4.1). The described approach does not require any preliminary trans-
formation of the eigenproblem and is applied directly to the pencil A AB. The
PLMR method uses an SPD (absolute value) preconditioner to improve its con-
vergence rate and robustness, and is based on the so-called refined procedure [44],
performed in the preconditioner-based inner product, to extract eigenvector ap-
proximations from four-dimensional trial subspaces. Although the current work
is concerned with finding only one eigenpair, the computation of several eigen-
pairs can be done similarly, either by using the method on properly deflated
subspaces, or by generalizing the presented ideas to the subspace iteration.
The present chapter is organized as following. In Section 4.1, we discuss a
concept of an idealized short-term recurrent preconditioned method (eigensolver)
for finding an interior eigenpair. We establish a connection between solution of
symmetric indefinite systems and eigenproblems, which allows us to extend the
results of the previous chapter, including the idea of the absolute value precondi-
tioning, to the case of the eigenvalue computations. In Section 4.2, we describe
86