Citation
Robust iterative methods on unstructured meshes

Material Information

Title:
Robust iterative methods on unstructured meshes
Creator:
Brezina, Marian
Publication Date:
Language:
English
Physical Description:
xiii, 174 leaves : illustrations ; 29 cm

Subjects

Subjects / Keywords:
Iterative methods (Mathematics) ( lcsh )
Iterative methods (Mathematics) ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 165-174).
General Note:
Submitted in partial fulfillment of the requirements for the degree, Doctor of Philosophy, Applied Mathematics.
General Note:
Department of Mathematical and Statistical Sciences
Statement of Responsibility:
by Marian Brezina.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
38277026 ( OCLC )
ocm38277026
Classification:
LD1190.L622 1997d .B74 ( lcc )

Downloads

This item has the following downloads:


Full Text
ROBUST ITERATIVE METHODS ON
UNSTRUCTURED MESHES
by
MARIAN BREZINA
M.S., Charles University, Prague, 1990
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Applied Mathematics


This thesis for the Doctor of Philosophy
degree by
Marian Brezina
has been approved
by
Thomas F. Russell
Data


Brezina, Marian (Ph.D., Applied Mathematics)
Robust Iterative Methods on Unstructured Meshes
Thesis directed by Professor Jan Mandel
ABSTRACT
We propose and analyze three multilevel iterative solvers of both domain de-
composition and multigrid type. All of these methods are algebraic, allowing
almost or fully black-box implementation. Their development was motivated by
the need to solve large algebraic systems of equations resulting from finite ele-
ment discretizations of self-adjoint, second order uniformly elliptic problems on
unstructured three-dimensional meshes. Two of the methods discussed perform
a simple, but effective domain decomposition as a part of the solving process.
This allows for a remarkable adaptivity, where the decomposition is generated
depending on the difficulty of the problem without requiring an input of a differ-
ent decomposition. We focus on achieving robustness features that allow using
the new methods as a replacement of direct solvers for solving these systems. The
new methods are superior in terms of computational complexity and storage re-
quirements. On serial architectures, the asymptotic computational complexity
of these methods for solving 3D problems is shown to be in the range of 0(n7/6)
and 0(n49/33). The methods all benefit from implementation on modern par-
allel architectures which can reduce the computational complexity to 0(n7/6)
m


for all three methods. The theoretical results are accompanied by computa-
tional experiments confirming the theoretically predicted convergence properties
and suggesting the potential of the methods for solving a much wider variety of
problems than the ones covered by the current theory.
This abstract accurately represents the content of the candidates thesis.
I recommend its publication.
Signed
Jan Mandel
IV


DEDICATION
Dedicated to Jitka Krfzkova.


ACKNOWLEDGEMENTS
It is my pleasant duty to acknowledge my gratitude to a number of
individuals who have influenced my work. I would like to thank all the faculty and
staff at the Center for Computational Mathematics at University of Colorado at
Denver for creating a great environment for research and computing. My thanks
go to professors Leo Franca, Tom Manteuffel, Steve McCormick and Tom Russell,
whose classes I had the honor of enjoying. The joint research meetings with the
members of the Department of Space Engineering of University of Colorado
at Boulder motivated me to focus on the solution of the real world problems;
most notably I thank Charbel Farhat for kindly providing the test data from
engineering practice. I thank all my collaborators for fruitful cooperation. I
want to express my gratitude to Caroline Heberton for carefully reading draft
copies of this thesis and proof-reading them. If the reader finds the text enjoyable
to read, it is by her merit. My sincere appreciation is due to my friend Petr Vanek
for great many conversations, both on and off the topic of numerical methods,
and for teaching me how overrated cleaning ones apartment can really be. Last
but most I want to thank my advisor, Jan Mandel, for his guidance, generosity
and friendship. His reading courses and informal meetings proved to be both
an inspiration and fun. It was he and Petr Vanek, who originally aroused my
curiosity and enthusiasm for fully algebraic approach to iterative solvers. I am
looking forward to our future cooperation.


CONTENTS
Chapter
1 Introduction and Model Problem................................. 1
1.1 Prologue.................................................. 1
1.2 Formulation of the Problem and Notation................... 6
1.3 Direct Solvers .......................................... 11
1.4 Preconditioned Conjugate Gradient Acceleration........... 13
2 Overview of Some Domain Decomposition
Methods ...................................................... 24
2.1 Abstract Schwarz Methods................................. 24
2.2 Overlapping Schwarz ..................................... 28
2.3 Non-overlapping Decomposition Methods.................... 30
2.4 Methods Reduced to Interface ............................ 30
2.5 The Neumann-Neumann Method............................... 34
2.6 EBE method............................................... 39
2.7 BDD as an Algebraic Method............................... 41
2.8 DD as a 2-level Multigrid................................ 50
2.9 Other Domain Decomposition Methods....................... 52
3 Fully Black-box Overlapping Schwarz
Method ....................................................... 53
3.1 Overlapping Schwarz Method with a Coarse Space........... 54
3.2 Smoothed Aggregation Coarse Space and BOSS............... 68
vii


3.3 Estimates for Smoothed Aggregation......................... 76
3.4 Practical Issues........................................... 83
3.4.1 Generation of Aggregates............................ 84
3.4.2 Nonscalar Problems.................................. 86
3.4.3 Computational Complexity............................ 87
4 Nonoverlapping Methods with Inexact
Solvers.................................................... 90
4.1 Inexact Subdomain Solvers.................................. 90
4.2 The Inexact Solvers Properties............................ 93
4.3 Matsokin-Nepomnyaschikh Abstract Theory.................... 95
4.3.1 Abstract Framework and Condition Number Estimate . . 96
4.3.2 An Application: Abstract Additive Schwarz Methods . . 98
4.4 Unextended Hybrid Schwarz Algorithm....................... 101
4.4.1 BDD as a Hybrid Schwarz Algorithm................... 103
4.4.2 A New Look at the Conditioning of BDD............... 105
4.5 Extended Hybrid Schwarz Algorithm......................... 107
4.5.1 Algorithm on Long Vectors with Exact Components . . . 107
4.5.2 Practical Algorithm on Long Vectors and ACDD........ 109
4.5.3 Estimate on Long Vectors for Inexact Neumann Solvers . Ill
4.5.4 Inexact Coarse Space and Harmonic Extensions ........... 113
4.5.5 Computational Complexity............................ 117
5 Two-Level Multigrid Alternative............................ 120
5.1 Alternative For Inexact Solvers........................... 120
5.2 Tentative Prolongator and Standard Two-level Multigrid . 121
vm


5.3 Modified Two-level Multigrid and MLS...................... 126
5.4 Practical Issues.......................................... 135
5.4.1 Generalizations ........................................ 135
5.4.2 Computational Complexity................................ 136
6 Numerical Experiments.......................................... 138
7 Conclusions.................................................... 154
Appendix
A Theoretical Results ........................................... 157
A. l Poincare-Friedrichs Inequality........................... 157
B Results Used In Computational
Experiments.................................................... 163
B. l The Stopping Criterion for PCG........................... 163
References........................................................... 166
ix


FIGURES
Figure
2.1 Harmonic extension of a corner peak............................ 35
2.2 Harmonic extension of a peak on the edge....................... 35
2.3 Harmonic extension of a function constant on one edge.......... 36
2.4 Harmonic extension of a function linear on one edge............ 36
2.5 Harmonic extension of a function with random values on one
edge............................................................... 37
3.1 Possible assignment of elements to different classes Cj in 2D. . . 72
3.2 Possible assignment of elements to different classes C{ in 3D.. . 73
4.1 Abstract framework scheme.......................................... 96
6.1 The checkerboard coefficient pattern. Dark subdomains corre-
spond to values gu, the light ones to a.?,............... 145
6.2 The mesh of the automobile wheel (data courtesy of Charbel
Farhat, University of Colorado at Boulder)........................ 149
6.3 The mesh of the solid with tetrahedron elements (data courtesy
of Charbel Farhat, University of Colorado at Boulder)....... 151
6.4 The mesh of the turbine with 123,120 elements (data courtesy
of John Abel, Cornell University)................................. 152
x


TABLES
Table
6.1 Comparison of BDD with ACDD(&) for different values of k in
case of the checkerboard coefficient pattern.................. 140
6.2 Comparison of BDD with ACDD(fc) for problem with exponen-
tially distributed random coefficients...................... 142
6.3 Comparison of BDD with ACDD(k) for problem with uni-
formly distributed random coefficients...................... 143
6.4 Comparison of BDD with BOSS and MLS for problem with
coefficients jumps in a checkerboard pattern formed by 125
subdomains. Coarse spaces of dimensions 2744 and 125 were
used for MLS and BOSS. Prolongation smoother of degree 1
was used, and MLS used 2 pre-smoothers, 2 post-smoothers. . 144
6.5 Comparison of BDD with BOSS and MLS for problem with
uniformly distributed random coefficients and coarse space of
dimension 125. Prolongation smoother of degree 1 was used,
and MLS used 2 pre-smoothers, 2 post-smoothers................ . 145
6.6 Comparison of BDD with BOSS and MLS for problem with
uniformly distributed random coefficients and coarse space of
dimension 125. Prolongation smoother of degree 4, and 4 pre-
smoothers and 4 post-smoothers were used in MLS............... 146
xi


6.7 Comparison of BDD with BOSS and MLS for problem with
exponentially distributed random coefficients and coarse space
of dimension 125. Prolongation smoother of degree 1 was used
for both BOSS and MLS, and 2 pre-smoothers and 2 post-
smoothers in MLS............................................. 147
6.8 Comparison of BDD with BOSS and MLS for problem with
exponentially distributed random coefficients and coarse space
of dimension 125. Prolongation smoother of degree 4 was used
for BOSS and MLS, and 4 pre- and postsmoothers in MLS. . 147
6.9 Comparison of BDD with BOSS and MLS for problem with
uniformly distributed random coefficients and coarse space of
dimension 2,744, degree 1 of prolongator smoother, and 2 pre-
and postsmoothers in MLS..................................... 147
6.10 Comparison of BDD with BOSS and MLS for problem with
exponentially distributed random coefficients and coarse space
of dimension 2,744, Prolonation smoother of degree 1 was used,
and 2 pre-smoothers and 2 post-smoothers in MLS.............. 148
6.11 Comparison of BOSS and MLS for solving the shell problem on
a mesh discretizing an automobile wheel with 9,915 nodes and
59,490 degrees of freedom. Prolongator smoother of degree 1
was used, and 4 pre-smoothers and 4 post-smoothers in MLS. 149
Xll


6.12 Comparison of BOSS and MLS for solving the 3D elasticity
problem with 25,058 nodes and 75,174 degrees of freedom, Pro-
longator smoother of degree 1 was used, and 2 pre-smoothers,
2 post-smoothersin MLS..................................... 150
6.13 Comparison of BOSS and MLS for solving a shell problem:
a propeller with 41,040 nodes and 123,120 degrees of free-
dom. Prolongator smoother of degree 1 was used, and 4 pre-
smoothers, 4 post-smoothers in MLS......................... 151
xm


The sole aim of science is the honor of the human mind,
and from this point of view a question about numbers
is as important as a question about the system of the world.
- C. G. J. Jacobi


1. Introduction and Model Problem
1.1 Prologue
After discretization, many problems of engineering practice reduce to
the numerical solution of linear systems. These systems are typically very large,
sparse, unstructured and ill-conditioned. The performance of linear solvers for
these discretized problems in terms of computational complexity is usually ap-
proximated by K'nP, where n is the number of degrees of freedom and k,/3 are
constants dependent on the particular choice of solution method.
The value of k is smaller for direct solvers than for iterative ones, which
makes them less costly for small to medium size problems. Furthermore, it is typ-
ically much less sensitive with respect to the conditioning of the problem. This
robustness feature is the chief advantage direct solvers provide. Unfortunately,
the exponent /3 is significantly larger for direct solvers, which makes their appli-
cation prohibitively expensive for large problems. Further, the lack of structure
in the problems makes efficient and versatile implementation of direct solvers on
modern vector and parallel computers troublesome. Another problematic issue
concerning direct solvers is the amount of storage needed to carry out the nec-
essary computations, as even the most sophisticated direct methods cannot, as
a matter of principle, avoid fill-in in computing factorization of a matrix.
The ever increasing demand to analyze very large finite element sys-
tems together with the drawbacks of direct solvers suggests considering iterative
1


solvers as an alternative for solving todays and future problems of practical
interest. In order to design an iterative method as a worthy replacement of ex-
isting state of the art direct solvers for solution of the large problems, the issue
of robustness has to be resolved. In order to achieve robustness features similar
to those of direct methods, it is necessary to find ways of easing the dependence
of their constant k on the conditioning of the original problem. A common way
to tackle this problem is an application of a conjugate gradient method with a
properly chosen preconditioner. For a large class of problems, two- or multilevel
methods seem to be the most appropriate choice for a preconditioner because of
their convergence and computational complexity properties. It is desirable for
commercial solver packages to be able to generate coarse discretizations with the
property that smooth error components on a given discretization are well ap-
proximated by a consecutive coarser level space. In addition, the process should,
if possible, be completely automatic, without dependence on explicit knowledge
of the underlying finite element mesh. The solvers should be able to rely on as
little information about the problem as possible (in extreme case, only its dis-
crete representation), while being able to exploit any additional information that
might be available. Despite the limited input, they must be able to efficiently
treat practical problems.
The idea of employing a problem of reduced dimension related to the
problem to be solved had been known on both discrete and continuous levels
long before the advent of systematic studies of multilevel methods. It appeared
in the works on the problems of economic modeling by Leontief in 1930s in a
context closely related to the aggregation technique we will use in Section 3.2.
2


The idea of nested iterations related to multigrid method can be traced back to
Southwell in 1940s. On the continuous level, the idea is known to have been
employed in solving the problems of reactor physics by Enrico Fermi in early
1940s and later by Morozov [70] and others.
Despite the wealth of the multilevel literature, up until recently the
known multilevel methods could not guarantee the above requirements of gen-
erality and mesh-independence of the method for general nonuniform meshes.
For these reasons, algebraic multilevel iterative techniques are very attractive.
The two-level domain decomposition offers one solution, mixing direct solvers at
subdomain level with an iterative method on the interface level. The localization
of the data in these methods makes them very suitable for modern parallel com-
puters. Another candidate is the class of algebraic multigrid methods. These
are usually harder to parallelize, but their convergence properties and possibly
optimal computational complexity are very favorable. The construction of these
algebraic multigrid (AMG) and domain decomposition (DD) methods has been
the focus of interest in the last years, as to which the growing list of references
dealing with the topic attests (e.g., the pioneering works leading to AMG of the
present day by Brandt, McCormick, Ruge and Stiiben [16, 77, 78, 82], and the
more recent papers by Vanek, Mandel and Brezina [87, 86].)
Our goal will be to design and study such algebraic methods. Focusing
on the class of problems arising from the finite element discretization of ellip-
tic partial differential equations, we propose several iterative solvers. Although
these are efficient methods in their own right, we will view them mainly as pre-
conditioners for the preconditioned conjugate gradient method.
3


The work is organized in 7 chapters. In the rest of Chapter 1 we intro-
duce a model problem and some notation used throughout this thesis. We also
recall the preconditioned conjugate gradient method as a means of efficient solu-
tion of large sparse systems. Chapter 2 gives an overview of some existing state
of the art domain decomposition methods which bear relevance to the methods
we will devise. Here we give comments and examples relevant to our main goal
of developing robust methods for problems on unstructured meshes. In Chap-
ter 3 we design an overlapping domain decomposition preconditioner with a new
coarse space. We will call the new method BOSS. We prove optimal convergence
properties of BOSS under regularity-free assumptions on the problem and very
weak assumptions on the finite element discretization mesh. The main advantage
of this method is its applicability to problems discretized on a variety of types of
finite elements and on unstructured meshes. The method also resolves the setup
difficulties usually associated with overlapping domain decomposition methods.
The overlapping subdomains are generated inside the solver by simple algebraic
means. Both the setup and the iteration itself are purely algebraic. One re-
markable quality of the method is its adaptivity; for more difficult problems, we
use more extensive overlapping achieved without ever partitioning the domain
into subdomains outside the solver. The asymptotic computational complexity
of the method implemented on a serial architecture in the optimal case is 0(n4/3)
and 0(n49/33) in 2D and 3D, respectively. This is far below the cost of a back
substitution step of typical direct solvers for sparse matrix problems, which is
0(n5/3). We also show that implementation on a parallel architecture decreases
the computational complexity to as little as 0{n) and 0(n7/6) in 2D and 3D,
4


respectively. In additive setup, the method is suitable for application of inexact
subdomain solvers. Chapter 4 discusses the application of inexact components in
substructuring methods. The issues of the influence of approximate subdomain
solvers as well as issues of approximate formulation of the method and enhanc-
ing the sparsity of the coarse level problem are studied here. Using the resulting
method as a preconditioner in the conjugate gradient method, the condition num-
ber is proved to be bounded by (7(1 + log(H/h))2, with constant C dependent
on the quality of the approximation used. Estimates for C are given. Chapter 5
describes an algebraic multilevel technique using smoothed transfer operators.
The resulting method, called MLS, is based on the same concepts as the one
of Chapter 3, but no subdomain problems are solved. Instead, a combination
of smoothing and coarse space correction reminiscent of the multigrid U-cycle
takes place. This leads to a significantly reduced computational complexity of
0(n7/6) and 0(n6/5) for a serial implementation in 2D and 3D, respectively, as
opposed to 0(n4/3) and 0(n49/33) in the case of BOSS. Optimal convergence
estimate is also given. Chapter 6 presents the results of numerical experiments.
The robustness of these methods with respect to a number of aspects has been
tested by solving both artificial and real-world problems. The comparison of the
results suggests BOSS to be the most robust of the three methods with respect
to a number of influences, including discontinuities in coefficients of arbitrary
pattern and complex geometries. Finally, Chapter 7 gives a brief summary and
suggests the directions of the future implementation improvements and research.
5


1.2 Formulation of the Problem and Notation
Let be a bounded domain in IRd, d G {2,3}, with a piecewise smooth
boundary dCl, and dCl = VD U FN with TD,TN disjoint, meas(rD) > 0. We
will denote by Td,Tn the parts of the boundary with Dirichlet and Neumann
boundary conditions, respectively. A coarse triangulation is needed in some
of the methods we will discuss. This can be achieved by decomposing Cl into
nonoverlapping subdomains i = l,...,J,
j2 = fliUfi2U...Uflj, (1.1)
such that Cli does not cut through any element. Other forms of decomposition
will also be discussed in the text whenever appropriate. Consider the model
problem
dv*
Lu=f in fl, u = g ouTd, = 0onrN, (1.2)
on
where
^=^ (W/3,,W^) (L3)
with the coefficient matrix (firs) uniformly positive definite, bounded and piece-
wise smooth on Cl, smooth on each Cli, and a{x) a positive constant in each
subdomain Cli, he.,
a(x) = cti > 0 for x G Cli.
Variation in the value of a(x) is allowed across substructure interfaces. It is possi-
ble to relax this assumption to the case when a(x) varies moderately within each
subdomain. Problem (1.2) can be restated in terms of an equivalent variational
Galerkin problem
uGV: a(u,v) = f(v) Vw G V, (1.4)
6


where V = H^D(Q,) denotes the Sobolev space (cf. [72, 1]) of H1^) functions
vanishing on TDcdQ, meas(rD) > C meas(<9fi).
The finite element discretization reduces problem (1.4) to a linear sys-
tem of algebraic equations
Ax = /, (1.5)
where A is the stiffness matrix with entries
Oij = a(0j,&) = S ^a(z)/Mz)^|^ dx
and fi (/, (f>i) is the load vector; fa are the standard finite element functions.
We assume the finite element space to be the usual conforming PI or
Ql space [23]. We define the subdomain decomposition interface
r = Udfii\rD (i.6)
*=i
and, for simplicity, assume that is the union of the closure of whole faces
of some or all of the boundary substructures. We associate parameters H,h
with the coarse and local mesh, respectively. We assume that the elements and
subdomains are shape regular in the usual sense [23].
It is easy to see that under our assumptions the bilinear form a(-, ) satis-
fies the usual assumptions of symmetry, V-ellipticity and boundedness (cf., [23]).
cilMlffi(n) < a{u,u) < c2\\ufm{p) Vu G V. (1.7)
We note that these restrictions are assumed in order to simplify the
analysis of the methods. Some of the methods discussed in the text perform
well if some or even all of the aforementioned assumptions are violated. Solv-
ing problem (1.4) under these assumptions should be viewed as as our minimal
7


goal; indeed, some of the discussed methods will be applicable to solving much
more general problems with weaker assumptions. Whenever appropriate, these
assumptions will be relaxed.
in 2D or 3D, respectively) and assume that the subdomains are of diameter
0(H) and shape regular, i.e.,
with dFi the Jacobian and || || the Euclidean IRd matrix norm.
Let Vh(0) be a conforming linear finite element space on a quasiuniform
triangulation % of f2 with a characteristic meshsize h, such that each subdomain
is the union of some of the elements, and the usual shape regularity and
inverse assumption hold (cf, [23]). All functions v G Vft(fi) satisfy homogeneous
boundary condition w = 0onTc.
Let Vh(£li) be the space of the restrictions of functions in V/j(f2) to £2*.
Throughout this thesis, C and c shall denote (unless specified otherwise) generic
positive constants independent of the shape or size of f2 and £2*. Note that
these constants may depend on the constant in (1.8) or on the regularity of the
triangulation, but they are independent of h and H.
Let tl denote a reference domain of diameter 0(1) (e.g., square or cube
= Ff(0), lieFill < CH, ||SFi-1|r< CH'1
(1.8)
Following [11], [28] and [81], we define the scaled Sobolev norms
where
8


The advantage of this definition is that it allows us to restrict all of our
considerations to the reference domain Cl and use the mappings Fi to obtain the
results for each fli from the obvious norm equivalence
(1.9)
c|Kn,<||o
c IMIi/2,anj lluo-^Hi/2,an^d 2 ^Ilwlli/2,af2i-
Assume that for each fij, fl is either empty or a part of d£li of
size bounded below by a fixed proportion of the size of d£li so that the Poincare
inequality (Theorem A. 1.4) holds uniformly for all Qi, with the constant C in-
dependent of h and H,
< CH\u\itn., Mo.dtti < CHll2\u\ii2,dsii
(1.10)
for all u G T4(^i) if I'd H dQi / 0 and for all u G V/i(Q*), fdn.uds = 0 if
TD n dtti = 0.
Throughout this thesis, we will adopt the following notation: We denote
by a(A) the set of all eigenvalues of A, by q{A) the spectral radius of matrix
A: ^(A) = max{|A(A)|}. We shall use (,), || || to denote the Euclidean
inner product and Euclidean norm, respectively. If B is a symmetric positive
(semi)definite matrix, we also use the (semi)norm derived from this matrix, i.e.
IM|b = (x,x)b2 = {Bx, x)1!2.
For a matrix A positive definite and symmetric in B-inner product,
the symbol conds(A) will denote the condition number, cond#(.A) = min^1,
where minimum is taken over all positive Am, Am such that Am < / A <
\Mx, x)B
Am Vs G 1R". For A = AT we obtain the usual spectral condition number
cond(A) = ||A||2||A-^||2 = > w^ere Amax and Amin denote the largest and
9


smallest eigenvalues of A, respectively. Note that for linear systems with matrix
A resulting from finite element discretization of second order partial differential
equations, the condition number is proportional to p (cf., Lemma A. 1.7). The
definition can also be extended to a product of two J5-symmetric positive definite
operators, i.e., the mutual condition number of a preconditioned system with
preconditioner M, condb(M, A). In this case, finding positive constants Am, Am
such that
VieIR" (111)
provides an estimate
condb(M,A) <
1 *-m.
and the mutual condition number is defined as the minimum of 441 over all
7Vjn
Am, Am satisfying (1-11). Adopting this definition, we will denote by abuse
of notation cond.B(.M-1A) = cond#(Ad, A). This follows from the identity
{Mx,x)b
and ----7r'-Ls- have the same extrema.
\y>y)B
Symbol A will denote the local stiffness matrix corresponding to sub-
domain fli, the vector of degrees of freedom corresponding to all the elements
in A and Ni the matrix with entries 0 or 1 mapping the degrees of freedom x
into global degrees of freedom, i.e., x = N^x.
Each x may be split into degrees of freedom x corresponding to
dVti fl T called interface degrees of freedom and the remaining interior degrees
of freedom x^ (thus the degrees of freedom on dQ are considered interior
degrees of freedom.) We will often find it useful to split local subdomain matrices
10


and the matrices Ni accordingly:
s =
X
A& =
4W
^11
|(0T
l12
4(0
^12
4
-^22

NitNi
(1.12)
where A^l, A%1 are the blocks corresponding to the interface and interior degrees
of freedom, respectively.
Similarly, let N and N denote the matrices with entries 0 or 1 mapping
the degrees of freedom x on T into global degrees of freedom, i.e., x = NTx
and between the degrees of freedom in £7 \ T and the global degrees of freedom,
respectively.
1.3 Direct Solvers
In this section, we briefly remark on some direct methods of solution
of linear algebraic systems commonly used in practice. Although the iterative
methods had actively been used in finite element computations well into 1960s,
the availability of new hardware with larger storage made it possible to use
direct solvers. For the last three decades, the majority of commercial finite ele-
ment codes have relied on direct methods to solve linear systems of equations.
Most of todays direct solvers proceed by first factoring the matrix A (e.g., by
Cholesky or Crout factorization) and then using back-substitution. The sim-
plicity and robustness of the direct methods have gained them acceptance in
the two-dimensional finite element analysis. However, their application to solv-
ing practical three-dimensional problems of even relatively modest sizes poses a
challenge for the in-core storage capacity and execution capabilities of current
11


high-end computers. For the sake of argument let us consider two model prob-
lems: a square consisting of n x n Ql elements and a cube oi n x n x n Q1
elements. In order to solve a simple diffusion problem on our 2D model mesh,
we will need to perform 0(n4) floating point operations to obtain the factoriza-
tion and the storage required to carry out the computation will be 0(n3) words.
The demands grow severely higher for the 3D model mesh, where the require-
ments are 0(n7) and 0(n5) for the asymptotic cost of factorization and storage,
respectively. The asymptotic cost of back-substitution is negligible compared to
the cost of the factorization, as it is 0(n3) and 0(n5) for the 2D and 3D model
mesh, respectively. There are direct methods requiring less work than the profile
method discussed above. Optimal in terms of computational complexity is the
method of nested dissection [41] requiring 0(n3) and 0(n6) operations for our
example in 2D and 3D, respectively. Although this is less than in the case of the
lexicographically ordered Cholesky factorization, it is still too expensive.
Of course, our model meshes are among those least suitable for direct
solvers. Indeed, there exist 3D meshes for which application of direct solvers is
more than appropriate. For instance, a single strip of 3D elements may be
quite easily solved by a direct solver if the degrees of freedom are properly
(re)numbered. It is intuitively clear that efficiency of direct solvers depends
on the connectivity of the mesh. In order to evaluate suitability of applica-
tion of direct solvers, Hughes, Ferencz and Hallquist developed the concept of
fractal dimension allowing to measure mesh connectivity (cf. [49] for details).
The larger the fractal dimension of the mesh, the more beneficial application of
iterative methods may be. An alternative measure allowing insight into the role
12


of mesh connectivity and its influence on the performance of direct solvers is the
Euclidean norm of the lengths of the skyline profiles.
In this thesis we focus on iterative methods as a superior replacement
of direct solvers for problems on large unstructured 3D meshes with high fractal
dimension, where their superiority over direct solvers can be most easily achieved.
But it is the experience of the author that many of these methods prove to
be worthy rivals of direct solvers even for problems with rather small fractal
dimensions and in 2D. For a comprehensive study of a variety of direct methods
of solution, appropriate data structures and other related topic we refer the
reader to [42].
1.4 Preconditioned Conjugate Gradient Acceleration
This section provides an overview of the conjugate gradient method
introduced by Hestenes and Stiefel [46]. Although a well-known method, we find
the PCG method to be of such a paramount importance that we will describe it
in detail. The treatment here follows mostly the exposition of Ashby, Manteuffel
and Saylor [3], allowing more generality than traditional references. Also, it is the
opinion of the author that this point of view is more transparent than alternative
ones. Other useful discussions appear, among others, in Luenberger [60], Concus,
Golub and OLeary [24] and Golub, Van Loan [44].
Let A : V > V be a SPD operator with respect to the inner product
(,). We need to solve the equation of the following type:
Au = b. (1.13)
Let n = dim(V). The conjugate gradient method belongs to the family of
13


polynomial iterative methods of the form
'Ui+1 U'i ~I ^ 1 Vij^i;
j=0
where denotes the residual in step i,Ti b Aui. From here, denoting by u*
the exact solution, we easily obtain the equation for the error e* = u* Ui,
&i+1 Ri+l(A)e0,
where Ri+1 denotes the residual polynomial (i^+i(0) = 1) of degree less or equal
to i + 1. Because J?j+1(0) = 1, we have
ui+1 = u*-Ri+1(A)e0
Uq + eo Ri+i(A)eo
u0 + (I Ri+i(A))eo
Uq + Qi(A)Ae o
= uq + Qi(A)r0.
Let us denote by Ki{tq, A) the Krylov subspace of dimension at most i generated
by r0 and A. That is,
Ki = span-j/o,Ar0,V0}.
Thus we can rewrite the last equation as
'Uj+i Uq T dj, dj G -Kj-j-i(ro, A) (1.14)
or equivalently
Ui T dj, d{ G -Kj_)_i(j"o, A)
(1.15)
14


Equations (1.14), (1.15) define a general gradient method.
Let B be a Hermitian positive definite matrix. We define CG(B,A) to
be the gradient method that chooses
dk £ Kk+ifro, A)
so that the norm ||e/t+i||s is minimized over Kk+i(ro, A)\
||e*+i||B > min over Kk+1(r0, A). (1.16)
This is the optimality property of the method. Different choices of matrix B,
result in different methods. Recall that Kk C Kk+i. Thus for all w E Kk we
have
0 = (Bek+Uw}
= {B(ek-dk),w)
= (Bek, w) (Bdk, w)
= ~(Bdk,w),
as (Bek,w) = 0 from previous step.
Therefore,
(Bdk, w) = 0 Vw G Kk(r0, A).
In other words,
dk G Kk+1 (r0, A) and dk B Kk (r0, A).
This defines dk up to a scalar. Thus, we can let
dk Q-kpk>
15


where pfc Kk+i is some conveniently chosen vector and scalar is yet to be
determined.
To find dk we will construct a B-orthogonal basis {pj }^=0 for Kk+\. We
can use a generalization of the Gram-Schmidt process to the B inner product to
do this:
Po = r0
_ (BApi,pj)
Pm-M 2>uP;> dj- (BpjtPj)
By construction,
Pj G Kj+1 and p3 B Kj
and so
dj ctjPj
for some a,-. We can once again use the orthogonality e3+i B Kj+1 to determine
0 = (Bej+Upj)
= {Bfa ~ = (Bej ,pj) a.j (Bpj ,p3),
and therefore
(Bej,Pj)
J {Bpj.PjY
This value is always well-defined because B is assumed Hermitian positive defi-
nite. Note, however, that this value is expressed in terms of the unknown error
ej, which raises the question of its computability. In order to compute ai) we will
need additional assumptions on B. For our purposes, it will suffice to assume
16


that B = p(A), where p(x) is an arbitrary polynomial with zero absolute term,
i.e., p(0) = 0.
Thus, in practice, vector dj satisfying the optimality property (1.16) is
obtained by imposing the equivalent 5-orthogonality condition
(Bej+1,z) = 0 Vz e Kj+i(r0,A). (1.17)
The above discussion defines the conjugate gradient method CG(B,A).
Different implementations of the method exist. These different algorithms are
mathematically equivalent for a symmetric and positive definite matrix A. Also,
by different choice of matrix B, we recover various known methods (setting
B = A, for instance, yields the original method of Hestenes and Stiefel). For
a systematic study we refer to Ashby, Manteuffel and Saylor [3]. As we will
mostly be dealing with symmetric positive definite problems, we only recall here
the so-called Orthomin implementation because of its superior computational
complexity and storage requirements.
Algorithm 1 (Orthomin). For a given initial guess uq lRn, set
r0 = b- Auq, po = r0, and
repeat
i a. = (g
1 (Bpitfi)
2. V*i
3. Ti+\ = U &iApu
A a (BAei+uPj)
l (Bp3',pj)
5. Pi+1 = ri+1 ELo PijPj,
until convergence;
17


This implementation of conjugate gradient method will work for any
matrix A provided all the coefficients at ^ 0. Algorithm 1 in general requires
storing all previous direction vectors pj in order to compute fy. Under our
assumption that A be symmetric positive definite, however, the recursion in
Step 4. naturally truncates so that only the last direction vector needs to be
stored. The necessary and sufficient conditions of existence of this and other
economical recursions in conjugate gradient method were studied in Faber and
Manteuffel [35].
The advantage of looking at the conjugate gradient method in the way
described above is that it simplifies the analysis of the error reduction. Note
that owing to (1.15) and (1.17) we may view each step of the conjugate gradient
method as eliminating the B-orthogonal Galerkin projection of the unknown
error onto the current Krylov space given by r0 and A.
Since Kn(r0,A) = V and en _Lb Kn(r0,A), we have en = 0, assuming
infinitely precise arithmetic. In other words, CG(B,A) converges in at most
n steps. This is how the conjugate gradient method was originally perceived.
Computational experiments have shown that the round-off error can cause loss
of orthogonality in practical applications. That is why today the method is
primarily viewed as an iterative rather than direct one, as first suggested by
Reid [75].
It is often useful to replace the problem Au b by a related problem
M~lAu M~lb, where M is a symmetric positive definite matrix. Matrix M is
called a preconditioner. Matrix M~XA is no longer symmetric in the (, ) inner
product, but we note that it is symmetric with respect to the energetic inner
18


product (A-, ) and with respect to inner product (M.,.). Therefore, we can now
apply the conjugate gradient method to the matrix A = M.~lA and b A4-1&.
By doing so, we arrive at the preconditioned conjugate gradient method (PCG).
The following is the Orthomin implementation of PCG:
Algorithm 2 (Orthomin implementation of PCG). For the given
initial guess uq G IR", set r0 = b Au0, Mpo = r0,
repeat
2. U{ T Q^iPi,
3. ri+x = n diiApi,
4. Mzi+\ = ri+i,
^ a. (Bzj+i,Pi)
' {Bpi,Vi)
Pi+1 %i+1 PiPji
until convergence;
Note that the computational complexity of this algorithm differs from
the Orthomin algorithm without preconditioning by the cost of the additional
preconditioning step 4. The following theorem offers a motivation for employing
a preconditioner. For simplicity, we will restrict the proof to the case of the
preconditioned conjugate gradient method of Concus, Golub and OLeary [24],
i.e., we set B = A.
Theorem 1A.1. For error in step k of the conjugate gradient method
applied to system M.~lAu = M.~xb, it holds that
||tt-ttfc||A < 2
^/cond(A4_1^4) 1
^cond(A4_1A) +1
k
||-olU-
(1.18)
19


Proof. Let us set ro = M x{b Au0). As uk is a A-orthogonal
projection, we have
(A(u -uk),u- uk) < (A(u -v),u-v) Vv e u0 + Kk(r0, M~lA).
Now for an arbitrary polynomial pk-i of degree k 1, taking
v = u0 +pfe_i(Ad_1A)r0 = u0 + M~lApk_i{M~lA)(u u0),
we have
(A(u-uk),u-uk)
< min (A(I M~l Apk_i(M~l A)){u u0), (/ M~l Apk_i(M~l A)){u uq))
Pk 1
< min {Aqk(M~lA)(u u0), qk(M~1A)(u u0))
Qk(0)=l
< ^ max \qk(\)\2{A{u u0), u u0).
It is a well-known result of approximation theory that the solution of
this minimax problem can be found in terms of the Chebyshev polynomials
Tk(x) = ^ ((a; + y/x2 1)* + (x y/x2 l)fc)
scaled to the interval of the spectrum [Amin(Af 1A), Amax(Ad 1A)]. We can then
deduce that
min max
?Jc(0)=l A6o(M_1A)
l?fc(A)| < 2
^/cond(Ad_1A) 1
y'cond(A/l_1A) +1
from where (1.18) follows.
By setting B A, A4 = I, we recover the error estimate for the
original conjugate gradients method of Hestenes and Stiefel.
20


We notice that this estimate for the convergence rate of the conjugate
gradient method depends strongly on the condition number. Namely, the num-
ber of steps required to reduce the error by a given factor is proportional to
^/cond(A4, A). Even though the estimate (1.18) often turns out to be rather
pessimistic (e.g., if the matrix A has clustered eigenvalues), we can see that the
selection of the preconditioner M is crucial to the performance of the method.
Theorem 1.4.1 thus gives an answer to two questions: It shows that the addi-
tional cost of preconditioning can be justified and it clarifies the sense in which
the preconditioner M. should be related to A if good convergence is to be guar-
anteed.
Based on these facts, we will seek preconditioners such that the problem
Mz = r is more easily obtained than that of Au = b. This will guarantee that
the cost of applying each iterative step of (PCG) will be small. We will want M
to approximate spectral properties of A in the sense that cond(A4~1A) is small.
This will guarantee that the number of steps necessary to reduce the error to a
required size will be small. The quantity cond(A4, A) = cond(A4_1A) will serve
as a measure of quality of a preconditioner.
We have shown that preconditioning can greatly improve the conver-
gence properties of the conjugate gradient method. Our work will consist in
finding suitable preconditioners. Sometimes these preconditioners are explicitly
available in the matrix form, but most often, we will find it convenient to con-
struct iterative methods to compute approximation of A~1rk needed in step 4.
of Algorithm 2. These iterative methods, if used as a preconditioner, improve
21


the convergence rate of PCG. We can also adopt a dual understanding of precon-
ditioning with respect to the linear symmetric iterative methods, namely that
PCG is often viewed as a means of acceleration of these methods. In order to
justify this claim, let us consider a linear iterative solver in the form
xk+i = Mxk + Nb (1.19)
consistent with the problem Au = b, i.e., having the same solution as Au = b.
Here N is assumed symmetric, hence M is Asymmetric, because it follows from
the consistency and the fact that A is SPD that A~xb = MA~lb + Nb V6, so
M = I NA and
(AMx,y) = {A(I NA)x,y) = {Ax, (/ NA)y) = {Ax, My).
Note that all the classical splitting type iterative schemes such as Jacobi, as well
as the variational multigrid methods may be written in this form (see Section 4.2
for more details). The consistency condition allows us to write alternatively
xfe+i = xk + N(b- Axk),
hence the method may be viewed as preconditioned Richardson iteration with
preconditioner M~l =N.
This process converges provided that
\\M\\A = q This means q= ||/ NA\\a < 1. Since we can easily verify that
cond(iW4) <
22


we can see that application of preconditioned conjugate gradient method with
preconditioner N (in other words, the preconditioner given by one iteration of
the process (1.19) with Xq = 0) yields a superior convergence factor, as
Jcond(NA) 1 1 1 q2
/ - ---<-------------- < q Vge (0,1). (1.20)
cond(N A) + 1 ^zf + 1 ^
Hence for a symmetric linear iterative scheme, a preconditioner for A can be
found so that the convergence rate of the scheme is accelerated by using the
PCG with that preconditioner. This concludes our brief discussion of the pre-
conditioned conjugate gradient method.
23


2. Overview of Some Domain Decomposition
Methods
In this chapter, we will give an overview of some known domain decom-
position methods. Because the domain decomposition field has grown very large
since the first studies, this overview cannot be exhaustive. Instead, we list a few
methods we will draw on in the derivation of our new methods. Some methods
are presented with the intention to contrast the differences with our approach.
We also attempt to direct the reader to the appropriate sources dealing with
related topics not covered in this thesis.
2.1 Abstract Schwarz Methods
In this section, we outline the theory of Schwarz methods, a useful tool
for analyzing many domain decomposition methods.
In recent years a lot of effort has been put into a systematic study of
various types of the domain decomposition methods, generalizations of the al-
ternating method of Schwarz [80]. Some of the pioneers of this work include,
among others, Bjprstad and Mandel [5], Bramble, Pasciak and Schatz [10],
Dryja, Proskurowski and Widlund [29], Matsokin and Nepomnyaschikh [71],
The Schwarz methods form a class of iterative methods for solving problems
arising from discretization of partial differential equations. These methods can
24


be fully described by dividing the original solution space V of (1.4) into sub-
spaces Vi C such that
Vu G V : v = Vi + v2 + + vj, whereuj G Vi, (2.1)
and defining bilinear forms Oj(.,.) on Vi x V7 These forms are assumed coercive
and bounded on Vi with respect to a(v)-norm. These assumptions will be
rigorously formulated in Theorem 2.1.1. Note that the decomposition in (2.1)
may not be always unique. The symbol Vo denotes the so called coarse space,
whose main purpose is to expediate the global correction of the error. Sometimes
this space fulfills other roles, as we will see in Section 2.7. The solution in Schwarz
methods is iteratively corrected by adding the approximaton of the current error
on subspaces V7 To this end we define operators % : V>V as
a,i{TiW,v) = a(w,v) \/v G Viy i = 0,...,J. (2.2)
The operators % are thus defined uniquely up to the possible shift in the kernel of
a,i(-, ). All the methods formulated in the text will be independent of this shift.
Note that if we set a,i(u,v) = a(u, v), Mu,v G Vi, % is an a{., .)-orthogonal
projection onto V. This is why we refer to % as to generalized projections.
Using the operators 77, a variety of methods may be constructed. They
differ in several aspects. The subspaces Vi may correspond to overlapping or
nonoverlapping subdomains QjCfi, allowing unique or non-unique representation
of a function u G V, respectively.
Schwarz methods can naturally be classified as multiplicative or addi-
tive, based on the manner in which the correction of the error is done. The
25


former is done following lines similar to the Gauss-Seidel method, while the lat-
ter is reminiscent of the Jacobi method (cf. [94, 55]). In either case, we can take
the benefit of the fact that the (approximate) projection of the error u* u can
be attained without the knowledge of the exact solution u*, using the definition
of 71:
ai(Ti(u* u),v) = a(u* u,v) = f(v) a(u,v), Vv e V*. (2.3)
The multiplicative Schwarz method starts from u 0 and proceeds by
updating the current approximate solution u< u where
Ui = %(u* -u) eVi, i = 0,..., J
is computed from (2.3). This process results in a nonsymmetric operator. If the
method is to be used as a preconditioner in PCG, it is desirable to work with
symmetric operator instead, in which case one can apply the updates once in
forward and once in backward order.
The additive Schwarz method is defined as
j j
uk+1 = uk + r^Uj = uk + 7j{u* uk), (2.4)
j=o j=o
where r is an additional correction parameter. With the operator C defined by
CA = '£% = T, (2.5)
1=0
we can rewrite the method (2.4) as
uk+1 = uk + TC{f-Auk). (2.6)
Hence, we may view operator C as an approximate solver to Au = /, defined by
j j
Cf = ui = £ 7~iU, Ui e Vh
i=0 i=0
26


where
ai{ui,v) = (f,v) VveVi.
Application of (2.6) by itself may be fruitful if r is chosen so that q(tCA) < 2. A
more common approach, however, is to use C as a preconditioner in the conjugate
gradient method for problem (1.4). Then we obtain
i.e., the convergence properies of the method may be studied as the spectrum of
the sum of the approximate projections %
This path has been taken by Bjprstad and Mandel [5], Dryja, Smith
and Widlund in [30], [32] and in a different setup by Xu [94]. The following
theorem summarizes the results of their research.
Theorem 2.1.1 (Dryja, Widlund [32]). Let T = Ya=\% and
(i) there exists a constant Co such that for all u G V there exists a decomposition
u = T,i=o Ui, Ui G Vi, such that
j
Y^a,i{ui,Ui) < C$a(u,u)
i=0
(ii) there exists a constant uj such that for i = 0,..., J,
a(u, u) < ua,i(u, u) V G Vi
(in) there exist constants Sij, i,j = 1,..., J, such that
a(v,i,Uj) < Mui G Vi,S/v,j G Vj.
Then, T is invertible and
CQ2a(u,u) < a(Tu,u) < (g(e) + l)ua(u,u) Vu G V. (2.7)
27


Here g(e) denotes the spectral radius of the matrix e = {%}/j=i-
Proof. The theorem will be proved as an application of the more
general framework of Section 4.3.1. Another approach may be found in [94].
Remark 2.1.2. Assumption (i) of Theorem 2.1.1 by itself guarantees
that inf Cq2. This result is known as the generalized Lions lemma. We
will need it in its original form in Section 3.1, where it is stated as Lemma 3.1.8.
The scale of methods is not limited to additive and multiplicative; one
may combine them together. For instance, Cai [18] proposes to apply the mul-
tiplicative approach to the local solves (i = 1,..., J), while treating the space
Vo in additive fashion with respect to V],..., Vj. This allows solving the local
problems in parallel with the coarse space problem. The local problem solving
takes advantage of the more rapid convergence of multiplicative methods. Such
an approach, however, inhibits parallelization of the local solves. Another pos-
sibility (cf. Mandel [64]) is to treat the local problems in additive way, while a
coarse problem of modest size is added in a multiplicative fashion. This approach
seems preferable because all the local problems may be solved in parallel, and
the multiplicative coarse grid update is more efficient than an additive one.
2.2 Overlapping Schwarz
Possibly the most studied of domain decomposition type methods is the
overlapping domain decomposition method ([80], [59], [31], [27]). In this method,
the original domain Q, is covered by a collection of overlapping subdomains. The
additive or multiplicative update of the solution described in previous section
are then applied. Using the definition of (generalized)projectors 71, the error
28


propagation operator of the multiplicative method can be written as
(I To)(I Tj)(I Tj-1) (/- T),
and the error propagation operator corresponding to the additive one is
i-0
Overlapping methods with a coarse space usually offer an improvement over the
optimal non-overlapping methods in the sense that in the overlapping case with
a coarse space the condition number can typically be bounded (cf. [33]) as
cond(.Ad_1A) < C( 1 + ^),
o
where 5 denotes the size of overlap of the subdomains. This better estimate
comes at a price, though, as for three dimensional problems even a small over-
lap amounts to a significant increase in size of the local problems. This fact
is especially pronounced if the subdomains are small. Another disadvantage of
these methods is that their setup is usually not very flexible. Typically, when
generating a system of overlapping subdomains, one starts from a system of
nonoverlapping ones and adds new layers of elements around each of them. This
is a cumbersome procedure and also one of the reasons why substructuring meth-
ods seem to enjoy more popularity today. Yet another issue troubling practical
usability of overlapping Schwarz methods is the construction of a coarse space.
For unstructured meshes it is difficult or impossible, using traditional geomet-
ric approach, to construct an automatic procedure yielding a coarse space that
would be a subspace of the original finite element mesh; this presents a technical
difficulty of having to deal with nonnested spaces (cf. [21, 20].)
29


The advantage of overlapping additive Schwarz methods is that they
easily lend themself to application of inexact subdomain solvers.
In Chapter 3 we will design and analyze an efficient black-box over-
lapping Schwarz method with a new coarse space avoiding the pitfalls of the
traditional approach.
2.3 Non-overlapping Decomposition Methods
As noted in the previous section, the application of the traditional over-
lapping domain decomposition methods requires an undesirably complex mesh
partitioning stage if the desired measure of overlapping is to be guaranteed.
It is difficult to design automatic partitioning of complex geometries unless an
algebraic concept similar to one described in Chapter 3 is used.
Nonoverlapping techniques simplify the partitioning stage significantly,
being able to use a variety of existing aggregation or greedy algorithms [37, 38].
2.4 Methods Reduced to Interface
Many of the substructuring methods are based on reducing the original
problem
Ax = b
to a problem defined only on the interface T common to at least two substructures
by eliminating all degrees of freedom not associated with any subdomain inter-
face. The initial problem is thus reduced to one whose unknowns are only the
interface degrees of freedom. As the interior degrees of freedom corresponding to
different subdomains have no coupling in the stiffness matrix A, the elimination
can be done locally on each subdomain and in parallel.
30


Let denote the stiffness matrix of the bilinear form a;(-, ) defined
on V x V, which is the restriction of the bilinear form a(-, ) to subdomain fij,
a(u,v) = Y^i=iai(u,v). For our model problem, given by operator (1.3), this is
the stiffness matrix with entries A^j = J2i,s=i o;(a;)/?r5(x)-'9^^ d nodes Vk,vi lie in fij. The stiffness matrix A can be obtained by the standard
process of subassembly
A = Y,NiA{i)Nf.
i=1
The matrices S resulting from the elimination of the interior degrees of freedom
of A are the local Schur complements. Writing the local stiffness matrices A^
in the block form
a W 4(0 '
Ad) = An Al2
A(i)T A(i)
^12 A22
where A^l corresponds to the subdomain interface and A%1 to the interior, the
local Schur complements have the following form:
= 4? 42>42rl4i- (2.8)
The Schur complement S corresponding to the global stiffness matrix A can then
be also obtained by the standard process of subassembly
j
S = Y< Nis{i)NI- (2-9)
. *=1
We can now solve the problem
Sx = b, (2.10)
where b = bA^A^b is the reduced right-hand side obtained by the elimination
of the interior degrees of freedom. Methods based on solving Schur complement
31


problems like this one instead of solving problems with the original stiffness ma-
trices are labeled as substructuring methods by some authors. We will, however,
use the term substructuring methods for a broader class of nonoverlapping meth-
ods, including the two-grid methods based on transfer operators with multilevel
smoothing described in Chapter 5.
One advantage of the reduced methods is already obtained by solving
the problem with Schur complement, which is significantly better conditioned
than the original matrix A (cf. Lemma A. 1.7). It also has many fewer unknowns.
The early engineering applications of substructuring of 1960s were solving (2.10)
by a direct solver (cf. [74]). Today, the reduced system (2.10) is usually solved
by a preconditioned conjugate gradient method; the preconditioner is typically
based on solution of the local subdomain problems.
Throughout the text we will find useful the concept of discrete harmonic
functions. These are defined by
a(uh, v) = Q Vw e #o(^) n Vh, i = l,...,J. (2.11)
In matrix form, this is equivalent to
4?z + A$x = 0, i = (2.12)
The discrete harmonic functions are uniquely determined by their interface values
x. Discrete harmonic functions are closely related to the Schur complement in
the following sense: if x is the discrete harmonic extension of x, then from (2.12),
x A22A21X and simple manipulations yield
xTAx = xTSx.
32


In addition, the following lemma demonstrates that the discrete harmonic ex-
tensions (hence also the Schur energy norm) have the lowest energy among all
vectors having the same values on the interfaces.
Lemma 2.4.1. Let A be a symmetric positive definite matrix parti-
al 1 a12
tioned into its interface and interior components A =
denote the Schur complement of A. Then
A12t A22
, and let S
xTSx = inf yT Ay.
y=x
Proof. Given y such that y = x, let us solve the following problem:
Find w such that w = 0 and vTA22W = vTA2\_y uTA22?) Vw
As A22 is nonsingular, such w exists and satisfies the equivalent minimization of
the functional
F(v) = \-vtAv + yTAv
over all vectors with v = 0. As this constraint is trivially satisfied by the zero
vector, we have F(v) < 0. Moreover, the function z = y + w is the harmonic
extension of x, as z = x and
ztAv yTAv + wtAv = 0 Vu G {u : u 0}.
Thus we obtain
{Az, z) = (A(y + w),y + w) = (Ay, y) + 2F(w) < (Ay, y).
From here the statement follows by the definition of Schur complement.
33


Figures 2.4, 2.4, 2.4, 2.4 and 2.4 depict the harmonic extension of simple
functions prescribed on the interface; the functions exhibit the smoothing effect
of discrete harmonic extension following from Lemma 2.4.1. Note that in certain
situations the trivial extension may coincide with the discrete harmonic extension
(cf. Figure 2.4).
It is well known [93] that in order to achieve good convergence prop-
erties, the preconditioner has to be augmented with a mechanism of global ex-
change of information. Let us note that another advantage of nonoverlapping
methods is that on unstructured meshes it is much easier to algebraicly construct
a coarse space for them than for the overlapping methods (cf. [63]). Treatment
of subdomains with nonmatching grids, using currently popular so called mortar
elements, is also easier and was recently studied in [4], [2], [53], [57], [19] and
many others. Mortar elements form a family of nonconforming finite element
methods. They are known to be as accurate as their conforming counterparts,
while offering more flexibility in refinement. The fine meshes on different subdo-
mains do not have to match over the interface, even the subdomains themselves
do not have to form a conforming coarse mesh. For the study of these methods
we direct the reader to the above references.
2.5 The Neumann-Neumann Method
The original Neumann-Neumann preconditioning operator M.^N by
De Roeck and Le Tallec [26] is based on the assembly (2.9) of the global Schur
complement from the local subdomain ones. It seems natural to precondition the
sum (2.9) by the sum of properly weighted inverses of \ If all subdomains
34


1
Figure 2.1: Harmonic extension of a corner peak.
Figure 2.2: Harmonic extension of a peak on the edge.
35


Figure 2.3: Harmonic extension of a function constant on one edge.
Figure 2.4: Harmonic extension of a function linear on one edge.
36


Figure 2.5: Harmonic extension of a function with random values on one edge.
37


are similar, this yields an efficient preconditioner, as
cond = condE S^S^) 0(1).
\ i=l i=1 / i,j1
The application of inverses of the local Schur complement matrices corresponds
on the continuous level to solving local Neumann problems on the subdomains.
This is why preconditioners of this type are called Neumann-Neumann. The
Neumann-Neumann preconditioner can be written as follows.
Algorithm 3 (Neumann-Neumann preconditioner, [26]). Given
r e V, compute z = M^Nr as follows.
1. Distribute r to subdomains,
n = DfNf r.
2. Solve the local problems
S^Ui = n (2.13)
on the subdomains.
3. Average the results by
j
z ^ NiDiUi.
t=l
The weight matrices Di are an important component of this algorithm allowing
to preserve good conditioning even when the subdomains differ in certain aspects,
such as the values of coefficients of the continuous operator. Some appropriate
choices for Di were discussed in [26] and [66]. We will mention the choices
suitable for problems with large variation of coefficients across the interfaces of
the subdomains in Section 2.7 and also in Chapter 6.
Since for the model problem with operator (1.3) the local Schur com-
plement matrices are typically singular, De Roeck and Le Tallec [26] used
38


an approximate LU decomposition obtained by replacing zero pivots in the
Gaussian decomposition by positive values.
This algorithm fails to provide any means of global distribution of in-
formation beyond that of block Jacobi type and, as expected, suffers from steep
deterioration of convergence as the number of subdomains increases.
2.6 EBE method
In engineering practice, the solver is often very closely tied to the finite
element software. One of such methods of the nonoverlapping domain decom-
position type is the element-by-element method originally proposed by Hughes,
Levit and Winget [50] and further studied by Hughes and Ferencz [48] and Tez-
duyar and Liou [84], [85].
This method, which may currently be a becoming industrial standard
in the field of iterative solvers applied to nonlinear problems, assumes that the
element stiffness matrices are available. Every element is treated as an inde-
pendent subdomain. The method is used as a preconditioner in the conjugate
gradient method. The main idea of the method is simple: The element matrices
are factored and the preconditioner applied element by element based on these
factorizations. The preconditioner may be written as
(nei \ /nei \ / Til \
Ilifi) n^(<)) n diag(^)1'2,
z=l / Vi=l / \i=nel J
where nei denotes the number of elements in the system and L,D,U denote the
factors of the Crout factorization of a matrix, L(A(i))D(A(i))U(A(i)).
Here all the element matrices are understood as embedded in an n x n identity
matrix. The existence of the factorization is guaranteed because of the use of
39


positive definite local stiffness matrices obtained from by scaling and
regularization A1 = I + diag(A)-1/2(A* diag(AW))diag(A)-1/2.
As pointed out by Wathen [91], the method is essentially a multiplica-
tive version of the original Neumann-Neumann method described in Algorithm 3.
The difference is that the subdomain stiffness matrices in EBE are first scaled
and regularized.
The chief advantage of the EBE method is its very low setup overhead
compared to the typical two-level domain decomposition or multigrid methods.
The method is very useful in treatment of nonlinear or time dependent problems,
where refactorizations need to be performed. The cost of factorization of local
element stiffness matrices is amortized over the iterations per timestep and over
the number of timesteps, but because of the low factorization overhead due to
the small size of element matrices, it is possible to perform refactorizations more
often, which usually leads to faster overall convergence. The potential of this
method is most visible in case of three-dimansional problems with low fractal
dimension of the mesh, where direct factorizations are extremely inefficient.
The whole procedure is quite suitable for implementation on parallel
architectures. This can be achieved by grouping the elements into noncontiguous
subgroups. Although the subgroups have to be processed sequentially, the local
stiffness matrices can then be factored in parallel for each element within a given
subgroup. Thus, both factorization and the action of the preconditioner can be
computed in parallel using this grouping of elements.
The obvious defficiency of the method is the lack of coarse space ex-
change of information. Despite this, computational experiments suggest that
40


EBE is suitable in application to nonlinear problems, where finding simple pre-
conditioners is critical, and where EBE proved superior to diagonal precondi-
tioning in terms of storage, convergence properties and sensitivity to penalty-type
boundary conditions (cf., e.g., [48].) Another shortcoming, preventing black-box
implementation, is EBEs reliance on the availability of the finite element data.
In Chapter 3 we present a fully algebraic method with a coarse space, somewhat
similar in spirit to EBE, with much improved convergence rate (independent of
the number of subdomains) proved under regularity-free assumptions for prob-
lems on unstructured meshes.
2.7 BDD as an Algebraic Method
The BDD preconditioner [62] is based on the original Neumann-Neumann
preconditioner by De Roeck and Le Tallec [26] described in Section 2.5. The algo-
rithm proposed in [26] suffers from the lack of global distribution of information
resulting in dramatic deterioration of performance with increasing number of
subdomains. As noted in Le Tallec [55], the method becomes impractical when
applied to problems with the number of subdomains larger than about 16. In
order to defeat this drawback, Mandel [62] added a coarse problem as follows.
Let rii = dim V^, 0 < rrii < rii, and Z{ be rii x rrii matrices of full column rank
such that
Ker S C Range Zi, i = 1,..., J (2-14)
and let W C V be defined by
j
W = {u £ V : v = ^ZNiDiUi, it, £ Range Z{\.
2=1
41


The space W will play the role of a coarse space very much like in variational
multigrid methods. We say that s E V is balanced if
ZjDjNjs = 0, i = (2.15)
The process of replacing r by a balanced s = r Sw, w E W, is called balancing.
We are now ready to define the action r 2 = Mrxr of the BDD preconditioner.
Algorithm 4 (BDD preconditioner, [62]). Given reV, compute
M~lr as follows.
1. Pre-balance the original residual by solving the auxiliary problem for un-
known vectors A; E IRm',
ZfDTNfir-S^NjDiZ, A,) = 0, (2.16)
J = 1
2. Set
j
s = r-SY, NjDjZjXj, S{ = DjNjs, i = 1,..., J. (2.17)
3=1
3. Find any solution Ui for each of the local problems
S^Ui = Si, i = (2-18)
4. Post-balance the residual by solving the auxiliary problem for jXi E IEtmi,
ZfDjNf{r -S^2 NjDjiuj + Z^)) = 0, i = 1,..., J. (2.19)
j=i
5. Average the result on the interfaces according to
j
NiDi(ui + ZiUi). (2.20)
2=1
42


If some 77ij == 0, then Zi as well as the block unknowns and A* are
void and the i-th block equation is taken out of (2.16) and (2.19).
An important design choice for both the original Neumann-Neumann
preconditioner and for BDD is the selection of weight matrices Di that form a
decomposition of unity on the interface space V,
'£NiDiNf = I. (2.21)
i=1
The most straightforward choice for Di is a diagonal matrix with the diagonal
elements being the reciprocal of the number of subdomains the degree of freedom
is associated with. A better choice, which also guarantees a convergence bound
independent of coefficient jumps between subdomains, is given in Theorem 2.7.3
below. For other possibilities, see [26] and notes in Chapter 6.
The presence of the coarse problem guarantees that the possibly singu-
lar local problems (2.18) are consistent. Indeed, after the pre-balancing Step 1.
of Algorithm 4, the vector s T NiDiZi Vi = 1,..., J. Therefore s* = DjNf s
is orthogonal to Zi, and since Ker (Si) c Zi} Si e Range (Si).
The presence of the coarse problem also guarantees that the result of
the algorithm does not depend on the choice of the solutions of (2.18), see [62].
In practice, the residual of the initial approximation should be balanced
first as in (2.19); then the first balancing step (2.16) in every iteration can be
omitted since the residual r received from the conjugate gradients algorithm is
already balanced.
The addition of coarse space results in a very robust method with op-
timal substructuring convergence properties, with the condition number of the
preconditioned operator bounded by C( 1 + log(^))2, as we will demonstrate.
43


Mandel [63] proved the following abstract estimate as the main tool for conver-
gence analysis of BDD.
Theorem 2.7.1. Algorithm 4 returns z = M~lr, where M is symmet-
ric positive definite and cond(M, S) < C, where
-T.U \\NJ Zi=l .
C = supj:
i Z^i=l N'i|ls(0
(vi,Ui) = 0 Vuj G Ker (Sw),
Ef=1 IK"2
. V*i G Vh
(S^Vi,Ui) = 0 Mvi e Range (^)j.
(2.22)
Note that a different proof of Theorem 2.7.1 can be found in Chap-
ter 4.4.2.
For the readers convenience, we sketch the theory leading to the opti-
mal substructuring condition number estimate. Our exposition follows Mandel,
Brezina [66]. For more detailed analysis as well as computational experiments
using BDD, see Mandel, Brezina [65].
In applying the bound of Theorem 2.7.1, we will find useful the concepts
of glob and glob projection, defined as follows.
Definition 2.7.2 ([66]). Any vertex, edge, and, in the 3D case, face,
of T will be called a glob. A glob is understood to be relatively open; for example,
an edge does not contain its endpoints. We will also identify a glob with the set
of the degrees of freedom associated with it. The set of all globs will be denoted
by Q. For a glob G e £7, define the glob projection as follows: for a vector u G V,
Equ G V is the vector that has the same values as u for all degrees of freedom
in G, and all other degrees of freedom of Ecu are zero. The glob projection in
terms of the local degrees of freedom is Eq = NJEqN{ :Vi~>Vj.
44


Note that any two distinct globs from Q are disjoint and it holds that
j
T=\JdQi\dn= (J G.
i=1 Ges
The mappings Eq, Eq correspond to zero-one matrices and satisfy
'£Eg = I, NjNi=YlEjl E% = E%E%, (2.23)
Gee Gee
and
G C dQ,i n dQj <=> Eq ^ 0, G c dQ.i Eq 0. (2.24)
We are now ready to give an abstract bound in the case when the matrices S are
scaled by arbitrary positive numbers aj. This corresponds to coefficient discon-
tinuities of arbitrary size between the subdomains. The theorem is formulated
and proved exclusively in algebraic terms.
Theorem 2.7.3. Let ck; > 0, i = 1,..., J, t > 1/2, and E}q, Ni} S^l\
and Zi satisfy (2.8), (2.23), and (2.14). Define Di as the diagonal matrices
A= E G-.Eg^O aj
r-Bgit 0
and assume that there exists a number R so that for all i, j = 1,..., J and all
GtG,
^ll-SguilllB) < -LRIkllira (2-26)
for all Ui such that Ui _L Ker (S'W), S^Ui _L Range ZThen the weight matrices
Di form a decomposition of unity (2.21), and the preconditioner defined by
Algorithm 4 satisfies
cond{M,S) where K = max* |{j : NjNi / 0}|, and L = maxjj |{(7 : E3q ^ 0}|.
45


Proof. The decomposition of unity property (2.21) follows from the
definition (2.25) and from (2.23),
= E E d(i,G)EG = E Ea = I .
i=1 i=l G-.Eg^O GeQ
Let j be fixed. Due to the bounded intersection of the subdomains, there are at
most K nonzero terms in the sum NjNiDiUi, and the triangle and Cauchy
inequalities imply that
HEJVfiVjAuilUm £ (El|iVjAAilU)2< ffElli'Ptotnlllo,,
i=1 ' i=l ' i=1
If Eq / 0, the coefficient d(i, G) from (2.25) satisfies d(i, G) < a\/(aj + a*), and
it follows from (2.23) and from (2.26) that
WNfNiDiUiWsu) < £
a;
a,- +
G:E3'^ 0 1 1
\\E%Ui\\sU) < E at + at
G:E3Uo 1 ^ J
t-1/2 1/2
^^1/2|kll5C0
< LR1/2
,1/2
sup
p> 0 1 + Pl
|<||s(0 < LR1/2\\ui\\S(i)
Therefore, by (2.28),
Ell E JVf JViAuillloi < AT2i2U|k| ||m .
j=1 t=l
which concludes the proof, in view of Theorem 2.7.1.
Note that the constant K is the maximal number of adjacent subdo-
mains £lj to any subdomain Q; plus one, and L is the maximal number of globs
in any d£li PI dQj. If t > 1/2, the estimate (2.27) can be slightly improved; in
46


particular, if t = 1, analogously to the method of De Roeck and Le Tallec [26],
one has cond{M, S) < K2L2R/2.
To apply Theorem 2.7.1, we first need to replace the S norm by the
scaled H1!2 norm. This is a standard result [10, 31, 92], which we state here for
reference in an c^-scaled form suitable for our purposes.
Lemma 2.7.4. There exist constants c > 0, C independent of H or h
so that
C H^/2,^ < |Mls(0 < C \u\ll2,dSli Vm G Vh(Wi)
Oti
To derive the fundamental inequality (2.26) assumed in Theorem 2.7.3,
we identify (by abuse of notation) V with 14(T) and Vi with V^dfli). Then the
glob projections are Eg : 14(F) > 14(r), and (2.26) becomes a bound on the
increase of the if1/2 norm when a function in 14(df2j) is changed by setting its
values to zero on all nodes of dQi \ G.
We first consider the two-dimensional case, fl C IR2. Since dVLi is one-
dimensional, we may use the properties of the space 14(0, H) of piecewise linear
functions on a uniform mesh with step h on the interval [0,i7]. The following
form of Discrete Sobolev Inequality was proved by Dryja [28].
Lemma 2.7.5. There exists a constant C such that
IMIx,00^,#) < C ^1 + log ^ IIuIIhV2(ojh) Vu £ 14(0, H).
We will also need the following bound for the H1!2 norm of the extension
by zero from an interval to the whole 3R, proved by Bramble, Pasciak, and
Schatz [10, Lemma 3.5].
47


Lemma 2.7.6. There exists a constant C such that for all u G 14(0, H)
satisfying u(0) = u(H) = 0, u = 0 outside (0, if),
{ H\
Ml/2,IR ^ C [1 + lg IMIi(0,ir) + Ml/2,(0,.ff)-
An estimate of the if1/2 norm of a function, obtained by sampling the
value of a given function at one point, follows easily.
Lemma 2.7.7. There exists a constant C such that for all u G 14(0, if),
0 < h < if, and wo G 14(H) defined by uo(0) = u(0), v0(x) = 0 for |rc| > h,
lWo|i/2,]R < C + lg ~J~j IMIl/2,(0,ff)-
Proof. Let L = ||ii||z,oo(o,iy) It follows from Lemma 2.7.6 that
KIi/2,]r < C f 1 + lg j llvo|li>(_/iiA) + |^o|i/2,(~h,h)- (2.29)
Using linearity of vq, we obtain
2 / /
h fh |o(s) -Vo(*)|s
|s-tp
dsdt <4 L2,
(2.30)
because ||v0||£co(_h>h) = h(0)|2 < L2. Thus, \vo\l/2,{-h>h) < CL2. But L2 <
C(1 + log x) ll'tAIli/2,(o,jy) by Lemma 2.7.5, which concludes the proof.
By subtracting such spikes at the endpoints, we can extend Lemma 2.7.6
to the case when the values of u at the endpoints are nonzero.
Lemma 2.7.8. There exists a constant C so that for u G 14(0, if) and
w G 14(H) such that w = u on [h, if h], and w[x) = 0 for x < 0, x > if,
( H\2
Ml/2,R ^ C \ l +logJ |Mli/2,(0,.ff)-
48


Proof. Define u(x) to be zero for x E (00, h) U (H + h, 00), and
linear in [-h, 0] and [H, H + h]. Further, define v0 and vH by
u{0), x = 0,
\x\ > h,
x = H,
\x H\> h,
vjj linear in [H h,H] and in [H, H + h]. Writing w as w = u vQ u#, and
applying Lemma 2.7.6 and Lemma 2.7.7, we obtain
(H\
1 +10§ ~h J + HV2,(o,h)
(H\
1+log ~k) + ,-ff)
(H\
1 + loS IMIl,(0,.ff) + 3(Ml/2,(0,.ff) + Nli/2,]R + Ml/2*)
< C ^1 + log ^ |M|i Application of Lemma 2.7.5 to the term ||u||l,(o,h) concludes the proof.
We now have the tools to estimate the if1/2 norm of the glob projections
Eq. This shows that an arbitrary function in Vh(dfli) can be decomposed into
its glob parts with only a small penalty of H1/2 norm increase.
Theorem 2.7.9. Let Q, c IRd, d, E {2,3}. Then there exists a constant
C not dependent of h or H, so that for any glob G E Q and for all u E Vh(dQi),
\\^GU\\l/2,mi C + IMIi/2,G-
t>oW =
v0 linear in [h, 0] and in [0, /i],
vH(x) = *
0,
u(H),
0,
49


Proof. In the 2D case, the proposition follows by using a mapping of
dCli onto an interval so that G maps to (0,H), from Lemma 2.7.8 for G being
an edge, and from Lemma 2.7.7 for G being a vertex.
In the 3D case, the proposition was proved for the case of G being a
face of dQi as Lemma 4.3 in [12]. In the case of G being an edge or a vertex of
dQi, the proof follows from Lemma 4.2. and the proof of Lemma 4.1. of [12].
The bound on the condition number of the BDD algorithm follows.
Theorem 2.7.10. Let C lRd, d £ {2,3}, and the weight matrices
Di be diagonal with the entries given by (2.25). Then there exists a constant C
independent of H, h and a;, so that the condition number of the BDD method
satisfies
cond(M,S) < G (l + log ) .
Proof. We only need to verify the assumption (2-26) of Theorem 2.7.3.
Lemma 2.7.4 allows to replace the S norms by the Hl/2(dtti) seminorms, which
may in turn be replaced by the Hl/2(dn,i) norms, owing to the Poincare inequal-
ity (A.7) Now it suffices to use Theorem 2.7.9.
2.8 DD as a 2-level Multigrid
Another method similar to BDD was proposed by Mandel [64]. As in
BDD, this method also treats the local problems in additive way, while a coarse
problem of modest size is added in multiplicative fashion. The main difference
between the two methods is that BDD requires subdomain local stiffness matrices
as input, whereas the method of Mandel described in [64] uses submatrices of A
instead.
50


The method may be written as follows:
Algorithm 5 (Hybrid Method). For a given r V, compute z as
follows:
1. Compute the coarse level correction
z0 G Vq : a{z0,v0) = (r,v0) Vu0 6 V0. (2.31)
2. Compute the local subdomain solutions
UieVii a(uh = (r, v{) a(z0) Vu, G V*, i = 1,..., J. (2.32)
3. Gather the solutions together
j
u = '£ui. (2.33)
i=1
4. Compute and return the corrected solution
z0 eV0: a(u z0, v0) = (r, v0) Vv0 G Vo- (2.34)
z u zq. (2.35)
Algorithm 5 is nothing but a two-level variational multigrid for the
problem Sz b, with a special smoother defined by (2.32), (2.33). Steps (2.31)
and (2.34) play the role of a coarse-grid correction. The fact that a method
can be viewed as either a domain decomposition or multigrid is not a coinci-
dence; the development of the recent years shows a tendency towards unification
of convergence theory for multigrid and domain decomposition methods (e.g.,
Bramble, Pasciak, Wang, Xu [14], Wang [90]). In Xu [94], a selection of spaces
is described with which the multiplicative domain decomposition and multigrid
methods coincide.
51


2.9 Other Domain Decomposition Methods
The wealth of multilevel literature is astonishing. Of the methods
strongly related to BDD, we mention two here: The method of Dryja and Wid-
lund [32] uses the same coarse space W as BDD and weight exponent t = 1/2
in (2.25), but instead of using the local Schur complement matrices S in (2.18),
it uses S^ + CiM with positive definite. This avoids solving singular prob-
lems.
Sarkis [79] obtained an estimate for a method similar to BDD for non-
conforming elements with any t > 1/2.
As the main purpose of this brief section is to give a few references to
other material deserving attention but not covered in this thesis, we note that do-
main decomposition methods have been developed and successfully implemented
for mixed formulations. There is also the dual approach to domain decomposition
advocated by Farhat and Roux [39]. This approach is also suitable for solving
problems arising from mixed formulations. For the application of domain decom-
position techniques to mixed finite element methods, see [43], [34], [25] and the
references therein. The dual approach has been introduced in [39] and further
studied in [69], [36] and [55].
For domain decomposition methods on nonconforming finite element
discretizations, see [4], [2], [53], [57], [19] and their references.
52


3. Fully Black-box Overlapping Schwarz
Method
In this chapter, we propose a practical solver based on the framework
of overlapping Schwarz methods. The practical disadvantage of the existing
methods is that their setup usually requires knowledge of the mesh structure for
generating the overlaps. Another issue troubling practical usability of overlap-
ping Schwarz methods is the construction of a coarse space. The possibilities
are either to start from a coarse finite element mesh forming nonoverlapping
subdomains and obtain the fine discretization mesh by refining or to create an
independent coarse mesh. The first approach requires the iterative solver to
have control over the discretization, while the second brings about the technical
difficulties of having to deal with nonnested spaces.
We will design and analyze a black-box overlapping Schwarz method
with a new efficient coarse space avoiding the aforementioned pitfalls of the
traditional approach. The creation of overlapping subdomains will be simplified
here. We will start from a system of nonoverlapping subdomains, but we will not
use the common construction of overlapping subdomains by adding new layers
of elements to nonoverlapping ones. Instead, the overlaps will be obtained by
a special smoothing procedure based on the stiffness matrix A. The desired
amount of ovelapping will then be achieved by algebraic means and controlled
by the amount of smoothing done. We will also show that the assumed system of
53


nonoverlapping subdomains can efficiently be generated as a part of the method
itself. The material in this section is based on the ongoing joined efforts with
Petr Vanek [17].
3.1 Overlapping Schwarz Method with a Coarse Space.
The purpose of this section is to specify requirements on the coarse
space and overlapping subdomains that will allow us to prove uniform conver-
gence. Requirements on the coarse space will be formulated in terms of its basis
functions, keeping in mind that our final goal is a black-box method efficient
on unstructured meshes. For this reason, we avoid the assumption on the L
boundedness of the gradient of basis functions.
Before we introduce notation specific to this section, let us briefly recall
some assumptions on the problem to be solved. Let C IRd, d £ {2,3} be
a Lipschitz domain and % be a quasiuniform finite element mesh on Q, of a
characteristic meshsize h. Let 14 be a PI or Q1 finite element space associated
with the mesh Th with zero Dirichlet boundary conditions imposed at some finite
element nodes u* £ TD c dVt. Those nodes will be referred to as constrained
nodes. For simplicity, we assume that the finite element basis functions scaled so that = 1. We consider the finite element discretization
Ax = b
(3.1)
of the following elliptic model problem: Find u £ 14 such that
a(u,) = (/, v)L2(a) Vu £ 14,
(3.2)
54


where
a(u, v) =y^ f dx, 0 < C\ < a(x) < C2 for all x Cl,
i=l X{
and Vh denotes the finite element space with resolution h. Clearly,
C\\v\2Hi(n) < a(u,u) < C2\u\%i(ny
Let us consider the system of overlapping subdomains {Dj}j=1 covering the com-
putational domain Cl. The coarse space V0 C Vh will be defined by its basis, i.e.
VQ = span{$;}/=1
and local fine level spaces {Vj}j=1 will be determined by subdomains Clj via
Vj = Vh n Hoi(anJ-\rjv)(^j) j = (3-3)
where FN = dCl\TD is the part of the boundary with the natural boundary
condition imposed.
For the sake of parallelism, we assume that we have a system
of index sets Cj such that
(1) U£iCi = {i>...,J}> cinCj = Z, i?j.
(2) Vk -La Vi for every k,leCj.
Denoting by Pi the a(-, -)-orthogonal projections onto Vi, respectively,
the error propagation operator of the method we will analyze can be written as
/ E Pf) (3.4)
JfCi j
Note that due to the A-orthogonality of the spaces Vj, 14 for j, k e Ci, we have
1 e pi=n (j ps). (3.5)
jtCi jGCi
TIq
t=(/ p0) n
i=1
55


so the method may be viewed as either a hybrid or purely multiplicative. This
fact allows for parallelism in the computation of subdomain error corrections.
The following two assumptions sum up our requirements on the sys-
tem of subdomains and on the coarse space basis functions that will be sufficent
for proving uniform convergence of the algorithm with error propagation opera-
tor (3.4).
Assumption 3.1.1 (Subdomain Geometry). Let Cl be union of sim-
ply connected clusters Clj of finite elements such that
a) diam (Clj) < CH, j 1,..., J.
b) 3c > 0 Va; G Cl 3Clj : x G Clj & dist(a;, dClj \ dfi) > cH, j = 1,..., J.
c) 3Ki, K2 > 0 : Va: G Cl the ball B(x,CrH) = {y G Cl : dist(y, x) < CrH}
intersects at most K\ + K2Cjt subdomains Clj.
d) meas(fij) > CHd, j = 1,..., J.
Assumption 3.1.1 c) will be used in the following context: An object of a diameter
0(H) intersects at most 0(1) subdomains Cl{.
Assumption 3.1.2 (Coarse Space Basis Functions). We assume
that the basis functions <£* of the coarse space Vo satisfy
a) |4i|B.(n) < CH*?, < CHd/2.
b) There is a domain filnt C Cl such that Yli §i(%) = 1 for every x G Clmt and
dist(a;, T^) < CH for every x G Cl \ f2mt.
c) supp($i) C fij.
Note that the assumption can trivially be satisfied by PI or Ql finite element
basis functions. We will, however, design a coarse space basis more appropri-
ate for unstructured meshes. As we will demonstrate, the basis of smoothed
56


aggregation functions described in Section 3.2 is a good candidate.
In the rest of this section we will prove the following theorem:
Theorem 3.1.3. Under Assumptions 3.1.2 and 3.1.1, for the error
propagation operator (3.4) of Algorithm 8 defined in Section 3.2 below it holds
that
with a constant C independent of H, h.
The convergence proof will rely on the use of Lions lemma [5], for
application of which we need the following simple existence result:
Lemma 3.1.4. Let Assumption 3.1.1 be satisfied. Then, there exists
a set of functions Vj G bF1,00(D) such that
1- < CH~l,
2. £/=i ij = 1 on ft,
3. ij>j = 0 on £1 \ flj.
Proof. For each Clj we define
Due to parts a) and c) of the Assumption 3.1.1,
/
j

(3.6)
and from the part b), we also have
j
(3.7)
57


We will show that
\'ipj\w1{ci) < CH l. (3.8)
Consider two points u, v £ Qj. Without the loss of generality, we assume i>j(u) <
ipjiy). Then we have
'ipjiu) = i7_1dist('u, dQj \ dfi) = i7_1dist(u, P)
for some point P £ dflj \ dQ. Further, using triangle inequality,
= H~ldist(v, dQj \ d£l)
< H-'dist(u,P)
< H~l (dist(u, u) + dist(u, P))
< H~ldist(u,u) +
Therefore,
l^j \Lip(Q)

Now, (3.8) follows from the well-known equivalence | \uv{a) ~ | |wi.(n).
Let us define
w(x)
Due to (3.6) and (3.7),
1
x £ fh
||^||z,(n) 5: C. (3.9)
Further, from (3.7), (3.8) and bounded intersections of subdomains ftj, denoting
the Euclidean norm in IRd by [| ||,
l|Vw(s)|| < ||VE^)()|| fmin]£^(j/)
j=i \yenj=i
58


-2
< £ \Wj(x)\\
j:x£Qj
< CH~l
yen J=1
(3.10)
for every x G Q \ B, where B is a set of zero measure. Finally, we set
ij = wl>j, j = 1, i J-
Statements 2. and 3. of the lemma are trivially satisfied by functions ipj. Fur-
ther, (3.8), (3.9) and (3.10) imply
l|VlM*)|| = \\w{x){Vi;j){x) + ^j(x){Vw){x)\\
< ||V^-(a:)|| Kz)| + $j{x)\ ||Vw(s)||
< CH~l
almost everywhere, which proves statement 1. of the lemma.
Let us define the linear interpolation operator Q : iiZ1^)V0 by
3 1 r
Qu = 'y'ai$i, where on aAu) =-----------jT/ u(x) dx. (3.11)
^ meas(ili) JOi
As by Cauchy-Schwarz inequality
/ u(x)dx < \(u, l)L2(n.)| < meas(fii)1/2||u||L2(n.)
i J Qf
and by Assumption 3.1.1 d)
meas(f2j) > CHd,
we obtain the following bound for the coefficients of interpolation Q:
M)l <
(3.12)
59


The proof of the following lemma is essentially standard, except for
certain technical difficulties stemming from allowing rather general geometry of
subdomains Qj.
Lemma 3.1.5. Under Assumptions 3.1.2, 3.1.1, for the interpolation
operator Q defined by (3.11) and every u e HqTd (U) it holds that
\\u Qm||l2(o) < CH\u\Hi{n) (3.13)
and
\Qu\h1(o.) < C\u\Hi(n), (3-14)
where C is a constant independent of H and h.
Remark 3.1.6. The inequality (3.13) is one of the forms of the so-
called weak approximation property (cf., [15]). It can be contrasted with the
traditional approximation property
||u Qu\\Hi{n) < CH\u\h\si)
used in the multigrid and finite element literature.
Proof. Let us set B = Vt \ fiint, where flint is the domain introduced in
Assumption 3.1.2, b). Further we define
B = {i : £li fl B / 0}, B' = [J Qi, W = sup{dist(x, T#)}
ieB X^B'
and set
Bq = {x G Q, : dist(rr, Fd) < W}.
From Assumption 3.1.1 it immediately follows that W < CH and therefore the
Poincare inequality yields
IMU2(b) < IMU2(b0) < C(rd)H\u\h1{b0)- (3.15)
60


Then, the restriction of Qu onto B can be expressed as
(Qu)(x) = *()$ (a;), x <= B.
i&B
Further, let us set A/i = {j : % n ^ ^ 0}. The inequalities
ll$i!Ua(n) < CHd/2, |ai{u)\ |ay(u)| < ^(a?(u) + a]{u))
together with bounded intersections and (3.12) yield:
IIQAlm = E E (<*()$,-, a, (t*)^)^*)
ieBjeAfiHB
< E E M)HMu)IINU2(tt)ll$jlU2(fi)
ieBjetfiCiB
< ctt'E E !(?() + <#))
ieBjesSinB z
< C'iJdmax{card(A/i)} ^ a?()
iB
^ IMIi2(fh)
iB
^ ^INH^Bo)- (3.16)
Using the last inequality together with (3.15) gives
||(/ Q)m||l2(B) < ||u||l2(S) + ||Qm||l2(B) < CH\u\hi(b0)- (3-17)
Analogously to estimate (3.16), using
N*i(n) < CHV-W2
in place of ||$i||L2(fi) < CHd/2 (see Assumption 3.1.2), we arrive at
\Qu^h1{b) ^ CH 2||w|||2(Bo) < C\u\%i(Bo), (3.18)
where in the last step (3.15) has been used.
61


In the rest of the proof we will verify (3.17) and (3.18) on the domain
ftint (see Assumption 3.1.2). For the convenience of the proving, let us consider
the extension ue of the function u G HqTd (^) satisfying
< C'(^)Mtfi(n)> uE = u on ft. (3.19)
Let us recall that Mi = {j : ftj fl ftj / 0}. For i = 1,..., J we define B[ =
Uje/Vj % and Bi to be a ball circumscribed about B[. From Assumption 3.1.1 it
immediately follows that
diam (Bi) and therefore, we have the Friedrichs inequality in the form
IM|i2(Bj) < CH\u\Hi{Bi) Vtt G {u G ^(Bi) : f v dx = 0}. (3.20)
J Bi
Further, due to Assumption 3.1.1 a), c), the intersections of balls Bi are bounded.
For every j = 1,..., J we define
Cj = J ue dx, Uj = ue Cj. (3.21)
Then, the Friedrichs inequality (3.20) holds for every Uj. Due to Assump-
tion 3.1.2 b), for x G ftjfl ftint it holds that
(Qu)(x) = (Qui)(x) + Qci
= E (Xj(Ui)j(x) +CiJ2 j(X)
jeA/i je Mi
= (Qui)(x)+Ci. (3.22)
Therefore,
\\(I Q)u\\L2(nmt) < II ^)ulli2(ninnint)
i=1
62


J
= 5Z IK7 Q)(ui + ci)||L2(njnn!nt)
1=1
J
= 52 IK7 Q)^illL2(n i=1
J
2 52 (||Wi||L2(Bi) + ll^^i|L2(ninf2int)) (3.23)
2=1
Further,
IIQ^i|L2(ninnint) || 52 aj(ui)j\\h(n)
jeM
< ( 12 M^)lll$jlU2(fi)
\ieM
< card (A/)) 5] a|(Mi)||$j||!2(n)
jeM
< CH~dHd 52 ||Mi||ia(nj) [Assumption 3.1.2 a) and (3.12)]
j'eM J
< C,||^i|li2(Bi)-
Substituting the last inequality into (3.23) and using the Friedrichs inequal-
ity (3.20) together with bounded intersections of balls {Bi}f=l, we get
IK7 ~ <2)ulli2(nint) < ^$2 ll^*lli2(Bi)
2=1
< CH2 52 \Ui\2Hi(Bi)
2=1
=
2=1
< CH2\u\2Hi(ny
The last inequality together with (3.17) proves (3.13). Similarly,
j
2=1
63


[used (3.21)]
j
= E 1 Q(uj + Cj) | fftfn.nnmM
2=1
J
= EIQ^ + Cil^^nnint) [used (3.22)]
2=1
J
E1 I iyi (r2infiint)
2=1
J
^ El E aj(ui)$jIffi(n)
*=i jeM
< E ( E laj'(*)ll^'|j?l(n)
i-1 \jOV;
< ^ | card(A/]) J] a2(ui)|$j|lri(n)
i=i \ jeM
J
< CHd~2H~dE E |]uil||2m.) [Assumption 3.1.2a) and (3.12)]
*=ljA/i
< ch ENIi^
2=1
J
< C E [used Friedrichs inequality]
2=1
= C*E
i=l
< C|u|^-i(n) [used bounded intersections and (3.19)]
which together with (3.18) completes the proof of (3.14).
The previous lemma allows us to prove existence of a //-stable decom-
position of function u G 14 into subdomain components.
Lemma 3.1.7. Under Assumptions 3.1.2, 3.1.1, for every finite ele-
ment function u G V/,, there is a (not necessarily unique) decomposition {wj}/_0,
Ui E U such that
j J
u = E^ and E NW) < ClMIffqnp
i=0 i=0
64


where constant C is independent of h,H.
Proof. Let us define the operator Ih : C0^) v Vh by
h(u) = J2u(vi)& = n((ui)?=i)>
i=l
where {uj}f=1 is the set of finite element nodal points, {(f>i}f=l is the finite element
basis, and Ila; = x Xifc is the finite element interpolator. Let us consider the
basis Li from Lemma 3.1.4. As diam (supp(-0i)) .< CH and <
CH-1, we also have
l|^t||L(n) < C.
Let us define the decomposition
u0 = Qu,
Ui = Ih(ipiw), i = where
w = (/ Q)u
and Q is the interpolation operator from Lemma 3.1.5. As to is a finite element
function and Yi=\ & = 1,
j j
= h(52ipiw) Ih{w) = w,
i1 i=l
proving validity of the first statement of this lemma. Further, for i = 1,..., J it
holds that
< C\ipiw\Hi(n.)
< C'||V(ViW)||[£,2(n.)] 65


= c\\wVi)i + i)iVw\\[L2m]d
< C(\\w'Vipi\\[L2(a.fld + W^iVwW^^d)
< C(\Vi)i\[Lc0(ni)]A\w\\L'2(ni) + IVw|[Loo.)]d11V'i11z,2(fij))
< c + MffMno).
Therefore, owing to the bounded intersection property of subdomains
Qi, the approximation property (3.13) and the energetic stability (3.14),
i=0 \i=l VrZ '
~ ^ (jP^ ~ ^u^2(n) + _ Q)u\m(n) + M^n))
< C (||ffi(n) + IQ^Izz^n))
< C\u\2Hi(fy,
which completes the proof of the second statement of this Lemma.
To complete the proof of Theorem 3.1.3, we need the following two
abstract results.
Lemma 3.1.8 ([5]). Let V be a Hilbert space with an inner product
Further let operators Pi : V > Vi be o(-, -)-orthogonal projectors. Then if there
exists a constant Cl> 0 such that
j j
Vu G V Bvi G Vi : v ^2 Vi and £ a(vi,Vi) < CLa(v,v),
i0 2=0
then
infaQTPi) > -J-.
i=0 '-'L
66


The following lemma is a straightforward simplification of [89, Theo-
rem 3.2] suitable for our purposes.
Lemma 3.1.9. Let
(i) There exists a constant Cl such that
j
a(v,v) < CLd(^2PiV,v) Vv £ V.
i=o
(ii) Let £ = be a symmetric matrix such that
a(P{U, PjV) < 6ija(PiU, u)lt2a(PjV, v)1^2 Vu, v G V, =
Then the product algorithm with error propagation operator
(I-P0)(I-P1)...(I-Pj)
is convergent with the rate bounded by
7=1----------------
7 Cifl + g^))2
We can now return to proving Theorem 3.1.3. The assumption (i)
of Lemma 3.1.9 follows from Lemmas 3.1.7 and 3.1.8. From bounded overlaps
property of subdomains flj we obtain
g{e) < C,
where the constant C is independent of the numbering of spaces Vi, i = 1,..., J.
Therefore, Lemma 3.1.9 yields
nc
ii(7-^)nn(7-fi)iu i=ljCj
where {Ci}^ are decomposition sets introduced in at the beginning of this
section. Now, the proof of Theorem 3.1.3 follows from (3.5).
67


3.2 Smoothed Aggregation Coarse Space and BOSS
In this section we define a coarse-space based on the concept of smoothed
aggregation introduced in [87]. Overlapping subdomains will be defined based
on the nonzero structure of prolongator. The method described here allows
black-box implementation; its only input is a system of linear algebraic equa-
tions Ax = f and the system of aggregates of degrees of freedom. Assumptions
on aggregates allowing the proof of uniform convergence will be given in the next
section.
Let be a given system of aggregates of nodes forming a disjoint
covering of the set of all unconstrained nodes i.e.
j
(J Ai {1,..., n}, Ai n Aj = 0 for i j.
i=l
We define a vector ll e IR71 as follows:
1
(li)j *
0
for node j e Ai
elsewhere.
(3.24)
Then we define the tentative prolongator P to be an n by J matrix such that its
ith column is equal to 1*.
Note that this grouping of nodes into disjoint sets and subsequent iden-
tification of each set with a single degree of freedom on the coarse space is re-
ferred to in the literature as aggregation technique. It was introduced in the
early 1950s by Leontief [58] and frequently used in the problems of economic
modeling (cf., Mandel and Sekerka [68] and the references therein.)
In order to eliminate oscillatory components from the range of P, we
introduce an n by n prolongator smoother S and define the final prolongator P
68


by
P = SP.
(3.25)
The following algorithm describes construction of the polynomial pro-
longator smoother S suitable for our purpose. The key property of the re-
sulting smoother is that q(S2A) < C'deg( the i/1-seminorm of our coarse-space basis functions is sufficiently small (Lem-
mas 3.3.2, 3.3.3). Note that there is no need to physically construct the prolon-
gator smoother S. For the sake of the methods implementation, we need only
the final prolongator P = SP, which can be generated in 0{n deg( in a single processor environment.
Algorithm 6. For a desired degree ds of the prolongator smoother S
and an estimate of the spectral radius of A such that
we define the prolongator smoother by
1. Let K [log3(2d5 + 1)J 1, where |_ J is the truncation to the nearest
smaller integer.
2. For i > 0, set Qi = and compute S = rijLo(^ f o^lAj), where Aj is
defined by the recurrence formula
q{A) < e < Ceg{A),
(3.26)
(3.27)
69


Remark 3.2.1. Algorithm 6 is capable of generating smoother S of
certain degrees only. The choice of K in the step 1. gives S of degree closest to
the desired one ds, see (3.29).
If we set for i e IN
&=n ('- >
then for the prolongator smoother S created by Algorithm 6 we have S
5Lio63(2ds+i)j- As
deg(Aj) = deg((J ^q^A^)2 A^) = 3deg(Aj_i) = 3\
we get
deg (#)
deg(S'j-i) + deg(Ai_i) = deg(Si_i) + 3* 1

i=o
3i 1
2
(3.28)
Therefore, S y < deg(S) < ds. (3.29)
The nonzero structure of the prolongator P determines the supports of
our coarse space basis functions
= ILPe*. (3.30)
Here, lire = Y!i=x is the finite element interpolator, {^} the finite element
basis and e* the ith vector of the canonical basis.
The prolongator P is obtained as a result of the matrix multiplication
P SP, where P is the tentative prolongator and S is the polynomial in the
stiffness matrix A given by Algorithm 6.
70


The computational subdomains tti are derived from the nonzero struc-
ture of the matrix Psymb, which is obtained in the same way as the prolongator
P, except the matrix operations involved are performed only symbolically. That
is, we replace nonzero entries of matrices by ones and use the arithmetic
1 + a = 1, 0 + a = a, 1 *ca a, 0 a = 0, for a = 0,1.
Then, defining
^ = supp(nPsymbei), (3.31)
we have
supp($i) C Vti. (3.32)
Note that if the sparse matrix operations are implemented so that numerical
zeroes are never dropped, the results of these symbolic operations are obtained
for free as a side benefit of the computation, so the symbolic operations need not
be performed at all.
Algorithmically, this can be accomplished as follows: First, for each
column of the smoothed prolongator Psymb let us define the list of its nonzeroes
Hj {i: P^mb / 0}, rij = card (A/))
and the n by rij 0 1 matrix Nj resulting from selecting the columns with indices
in Afj from the n by n identity matrix. Further we define local matrices A{ and
local correction operators Ri
A = NfANi, Ri = Ni{Ai)-1Nj', i = (3.33)
Analogously, for the coarse level we set
A0 = PtAP, R0 = P(i0)_1PT. (3.34)
71


Figure 3.1: Possible assignment of elements to different classes Ci in 2D.
For a positive i, RiA is the Aorthogonal projection onto the local space
Vi {x G 1R" : xj = 0 for j 0 A/i}.
Note that (t^, || ||a) is the vector space isometrically isomorphic to the space
of finite element functions (Vi = {£_! x ^ Vi}> a(v)^2)- Also, V. intro-
duced this way satisfies (3.3).
For the sake of parallelism, we need a disjoint covering of the
set {1,..., J} satisfying
cos(V), Vk) = 0 for every j,k £ Ci, i = 1,..., nc, (3.35)
where the cosine is measured in Ainner product. For structured meshes, such
a covering can easily be obtained. For simplicity of demonstration, consider 2D
and 3D rectangular meshes. Figures 3.2 and 3.2 depict possible decomposition
into noncontiguous groups forming Ci in 2D and 3D, respectively.
72


6
6 5
Figure 3.2: Possible assignment of elements to different classes Ci in 3D.


We can see that for the rectangular meshes the elements can be subdivided into
four noncontiguous groups in 2D and into eight in 3D. We are, however, seeking
a similar decomposition for any (unstructured) mesh. For unstructured meshes,
such a decomposition can be created using a simple greedy algorithm, as the
information about the orthogonality of spaces V is available. Trivially, spaces
Vi and Vj are orthogonal if
aw 0 for all k G A/i, l G A/},
where a^i are entries of the stiffness matrix A. Such a test can be easily performed
using the formula
^symb psymbT ^ ^symb + psymb
where is the operation of the symbolic matrix multiplication. Then,
If (Ascymb)ij = 0 then cos(Vi, Vj) = 0.
The disjoint covering {Ci}^ of the set of coarse space degrees of freedom satis-
fying (3.35) can be created using the following algorithm:
Algorithm 7. Set TZ = {1,..., J}, i = 0 and
1. repeat
2. set i < i + 1,
3. set 7Zi = 7Z, Ci = 0,
4. repeat
5. choose j &7Zi,
6. set Ci Ci U
7. for each k : (Asymh)jk = 1 set TZi <7Zi \ {/c},
74


8. until 7Zi is empty,
9. set 72 ^ 72 \ Cj,
10. until 72 is empty,
11. set nc = i.
Now we have all the components needed to write down the implemen-
tation of the Schwarz method with the error propagation operator (3.4).
Algorithm 8 (BOSS). Given a vector x\ the method returns xl+1
computed as follows:
1. Set z = x\
2. Local corrections:
for i = 1,... ,nc do
zi = z1-1 + Y. Ridi> where di = f- Azi~l
jeCi
and Rj is the correction operator defined in (3.33).
3. Coarse level correction:
z = znc + R0(f Aznc),
where Rq is the coarse-level correction operator given by (3.34).
5. (optional) for i = nc,..., 1 do
z* = z*-1 + Y Rjdi> where = / -
j'eCj
6. Set xi+1 = zQ.
Note that if the optional post-smoothing step 5 is used, the algorithm
can be used as a symmetric preconditioner in the conjugate gradient method.
75


Remark 3.2.2. For the sake of brevity, we denote the method de-
scribed by Algorithm 8 with the choice of operators Ri, Rq given by (3.33), (3.34)
as BOSS, short for Black-box Overlapping Schwarz with Smoothed Coarse Space.
We will find this abbreviation useful in the tables of Chapter 6.
3.3 Estimates for Smoothed Aggregation
In this section, we apply the general estimates to smoothed aggregation
method defined in previous section. In order to prove the convergence of the
method with the coarse space generated by smoothed aggregations, we only
have to verify that Assumptions 3.1.2 and 3.1.1 are satisfied.
The pattern of the stiffness matrix A = determines the undi-
rected graph
e = {v,s),
where vertices V = {1,... ,n} are indices of all unconstrained nodes and edges
£ are given by
= {[*, j] e V x V : Oij 0}.
For i G V and a positive integer r let us define the graph rneighborhood of i
by
= {j e V : dist(i,j) < r}.
Here, the distance of two vertices i, j is the minimal length of path connecting
i,j measured in the number of edges forming the path.
In the rest of this section, we will prove the optimal convergence re-
sult under the following assumption on the system of aggregetes The
first part of the assumption controls aspect ratios of aggregates. The second
76


part specifies the number of smoothing steps involved in the construction of the
prolongator smoother.
Assumption 3.3.1. There are positive integer constants c, C,
and a positive integer a characterizing the graph size of aggregates such that
a) In each aggregate At there is a node j satisfying
B(j, ca) C At, and dist{k,j) < Ca for every k e A{.
b) For the degree ds of prolongator smoother it holds that
C\Ol The decomposition satisfying the Assumption 3.3.1 can be easily gen-
erated using a simple greedy algorithm, see Algorithm 9 in the Section 3.4.
The following simple algebraic result is the key tool needed for verifi-
cation of the Assumption 3.1.2 a).
Lemma 3.3.2. For the prolongator smoother S created by Algorithm 6
it holds that .
q(S2A) < Cdeg^)-2^^) and g(S) < 1.
Proof. ([88]). Throughout this proof, we use the notation introduced
in the Algorithm 6. First, by induction, we prove g(Ai) < For i = 0, the
inequality holds by (3.26); assume it holds for j < i. Then, by
Ai+1 = {I ^QilAifAi
and the inductive assumption
e(Ai+1) = max (1 - < max (1 - = ft+i.
tea(Ai) 6 *e[o,ei] o 9
77


Hence
e(Ai) < Q) Q= Qi-
Now, the second estimate q{S) < 1 follows immediately from the fact that a product of terms of the form I |QjxAj.
Let us estimate the spectral bound of S2A. It is routine to verify that
S2A = Ak, where K = |k>g3(2ds + 1)J 1.
Therefore, taking into account that (see Remark 3.2.1) 3K > Cds > Cdeg(S)
and Qo = q < Cqq(A) (see (3.26)), we get
g(52/l) < (i)K, < Cdeg"2(5)gM),
which was to be proved.
The following lemma demonstrates validity of Assumption 3.1.2, a).
Lemma 3.3.3. Under the Assumption 3.3.1, for coarse space basis
functions defined by (3.30) it holds that
IIh1 (n) CH 2 and ||$i||L2(n) < CH2, (3.36)
where H = ah.
Proof. Taking into account the underlying quasiuniform PI or Q1
finite element mesh, the number a characterizes the discrete diameter of ag-
gregates Ai, and we have
card (A) 5: ctd.
Further, due to the Lemma 3.3.2,
e(s*A) <
78


Therefore, using the fact that g(A) < Chd 2,
NW) < Ca{IiSPe\ n = CiASl^Sh)
< CoT2g(A) card(^)
< Cad~2hd~2
= CHd~2.
Similarly, using the fact that g(S) < 1 (Lemma 3.3.2),
INIi^.) = l|n < Chd(Sli,Sli)
< Chdg(S2) card(*4*)
< Chdad
< CHd,
completing the proof.
Remark 3.3.4. The estimate for the smoothed functions in Lemma
3.3.3 is a significant improvement over the case of unsmoothed functions 111*,
for which we can only prove |nij|^^n^ < CHd~l/h.
Now we are ready to complete the verification of Assumption 3.1.2 for
smoothed aggregations.
Lemma 3.3.5. Under Assumption 3.3.1, the coarse-space basis {$*}
generated by smoothed aggregation technique described in the previous section
satisfies Assumption 3.1.2.
79


Proof. The assumption a) follows from the Lemma 3.3.3. The as-
sumption c) has been verified in the previous section, see (3.32). Let us prove
b). Basis functions derived from the tentative prolongator P, i.e.
= ILPe*, i = 1,..., m
satisfy the decomposition of unity
m
E^ = 1 (3-37)
i1
everywhere on Q, \ BrD, where Btd is the union of elements % such that
Discretely, (3.37) holds in every unconstrained finite element nodal point. Let V
be the index set of all finite element nodal points Vi £ BpD (BFd is understood
as a closed domain.) The vector of units is a local kernel of the matrix A. More
precisely, for a vector of ones, u £ lRn, it holds that
(Au)i 0 for every i £ V.
Further, for a positive k and the vector of units u,
(.Aku)i = 0 for every i such that dist(i, V) > k + 1,
where dist is a graph distance introduced at the beginning of this section. The
prolongator smoother S is a polynomial in the stiffness matrix A with the abso-
lute term equal to 1. Therefore, for the vector
m m
w = YlPei = J2Spei = Su
i=1 i=1
80


we have
Wi = 0 for every i such that dist(i, V) > deg(5).
As X) = ILSit, the decomposition of unity 1 $* = 1 is violated at most at
deg( and H = ah completes the proof.
Lemma 3.3.6. Under Assumption 3.3.1, the computational subdo-
mains Cli defined by (3.31) satisfy Assumption 3.1.1.
Proof. The proof consists of simple, but rather tedious geometrical
considerations. Computational subdomains are defined by = supp(Il<5symb *
P*ei), where is the operation of the symbolic matrix multiplication and Ssymb
is the polynomial in A created using symbolic matrix operations too. The degree
of Ssymb satisfies
co. < deg( Further, the support of basis function derived from the tentative prolongator
supp(nPei)
is formed by all elements Tj such that at least one vertex of Tj belongs to the
aggregate Ai. The smoothing by finite elements.
Taking into account the quasiuniformity of the underlying mesh and
the fact that H = ah deg(5symb)h, the measure of added deg( elements itself is greater or equal to CHd. So,
meas(fij) > CHd,
81


which proves the Assumption 3.1.1, d). Also, due to Assumption 3.3.1, a), we
similarly obtain diam (fij) < CH. Hence a) is also verified.
Let us prove b). The supports of basis functions nPei cover the domain
Q,. As £li is created by adding deg( ca layers of elements to supp(nPej),
we have
distR cH Vx G supp(nPej),
where distRd(-, ) is the Euclidean distance. As every point x G Q, belongs to
some supp(nPej), b) is proved.
It remains to verify c). The first part of the Assumption 3.3.1, a) says
that each aggregate A contains a graph ball of a radius r > ca, where c is the
positive integer constant. Let us interpret this assumption geometrically in IRd.
For each aggregate of vertices Ai let us define the cluster C{ consisting
of all finite elements Tj so that all vertices of Tj belong to A- From the first
part of the Assumption 3.3.1, a), it follows that there is a ball Bi c C* such that
diam (Bi) > cH. As the aggregates A are disjoint, the clusters Ci and balls Bt
are disjoint as well. Summing up, we have proved the following properties for
subdomains
diam (fij) < CH.
For each A there is a ball Bi C A such that diam (Bi) > cH and balls
Bi, i = 1,..., J, are mutually disjoint.
From here, the Assumption 3.1.1, c) follows.
We have verified that Assumptions 3.1.2 and 3.1.1 are satisfied, hence
we can apply Theorem 3.1.3 to prove convergence of the method with coarse
space given by smoothed aggregations.
82


Theorem 3.3.T. Let the Assumption 3.3.1 holds. Then, for the error
propagation operator T of the method described in the Section 3.2 applied to
the model problem (3.2) it holds that
FIU < i c.
where C is a constant independent of h and the size of aggregates.
Proof. The proof follows immediately from Lemmas 3.3.5, 3.3.6 and
Theorem 3.1.3.
3.4 Practical Issues
The overlapping method with the coarse space given by smoothed ag-
gregations has very favorable convergence properties common to most overlap-
ping Schwarz methods with a coarse space. The new method has, however,
certain advantages over the existing overlapping methods. The advantages of
the smoothed aggregation approach include: It can be implemented as a black
box with no input required from the user except for the stiffness matrix and the
right hand side of the problem. Even though the analysis assumes an elliptic
functional discretized on a quasi-uniform mesh by PI or Q1 finite elements, nu-
merical experiments confirm applicability of the method to unstructured meshes
and a variety of finite element types and problems far beyond the scope of the
current theory.
The disadvantage common to all over lapping-type domain decomposi-
tion methods is the increase of computational complexity with increasing mea-
sure of the overlap. Our method cannot avoid this drawback, but the use of
coloring in the definition (3.4) allows parallel implementation of local solves
83


and reduces the processing time.
In the rest of this section, we discuss ways to generalize the method for
solving nonscalar problems. We also analyze the computational complexity of
the method and a practical algorithm that can be used to generate the aggregates
A-
3.4.1 Generation of Aggregates
We now describe a greedy algorithm which will generate the subdo-
mains satisfying the Assumption 3.3.1. First we extend the definition of graph
neighborhood of a node to the graph neighborhood of a set A C {1,..., n} of
nodes:
B(X,a) = {i: dist(i,X) < a}.
With this definition, we can write the
Algorithm 9. For the given stiffness matrix A and positive integer a,
create the system of aggregates {A} as follows:
1. Set 1Z {1,... ,n}, j = 0.
2. for i = 1,..., n do
3. if i 6 TZ then
4. if B({i}, a) C V, then
5- j<-j + 1,
6. A?^
6. Hi7?. \ A?,
7. end if
8. end if
84


9. end for
10. set J = j,
11. for i = 1,..., J
12. A^Au {B(Ai,a) niz),
13. TZiTZ \ Ai
14. end for
In order to fully complete the aggregate generation description, we give
an algorithmic recipe for computing the a-neighborhood of a set X used in
Algorithm 9
Algorithm 10. Given a set A C {1,..., n},
1. Set w IRn as Wi
1 if ieX
0 otherwise.
\
2. Set ut Aa w (both the power and the multiplication are performed
symbolically.)
3. Set B(X, a) = {i: Wi = 1}.
Remark 3.4.1. Algorithm 9 generates a disjoint covering {A} of the
set of all vertices that satisfies Assumption 3.3.1. In steps 1.10. Algorithm 9
generates graph neighborhoods A B({j}, a). After Step 10., the set 1Z con-
tains the remaining nodes that could not be made into whole a- neighborhoods.
For these nodes it holds that
V j TZ 3 A such that dist(j, A) < a.
Steps 11. through 13. add at most a layers of surrounding vertices to some
of the aggregates A It follows from the construction that at the end of Algo-
rithm 9, TZ = 0.
85


3.4.2 Nonscalar Problems
The method can easily be modified for solving nonscalar problems. We
will briefly describe the changes required. This approach first appeared in [87]
in the context of solving problems of order higher than 2. In order to apply
the method to nonscalar problems, we need the knowledge of the discrete rep-
resentation of the local kernel of the bilinear form a(-,-)- By local kernel we
mean the kernel in absence of essential boundary conditions, i.e., the kernel of
the unconstrained problem.
Let us assume that we have functions {P}j=i spanning the local kernel
of a(-, ) (in case of 3D elasticity, 6 rigid body modes). For each function f\ we
need its discrete representation with respect to our finite element basis, or the
vector /* such that
f = n/\
Finite element packages usually provide this information. For every aggregate
Ai let us define the set T>i of all degrees of freedom associated with nodes of Ai.
Then the tentative prolongator can be constructed by
Algorithm 11 (Tentative prolongator nonscalar problems).
For every aggregate Ai and for j = 1,...
1. For P, compute the vector /u £ IRn with components
fl3 = {
fi
0
if k £
otherwise.
2.
Interpret the vector /u as the rik(i 1) + j-th column of the tentative
prolongator P.
86


Full Text

PAGE 1

ROBUST ITERATIVE lVIETHODS ON UNSTRUCTURED l\1IESHES by MARIAN BREZINA M.S., Charles University, Prague, 1990 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Doctor of Philosophy Applied Mathematics 1997

PAGE 2

This thesis for the Doctor of Philosophy degree b y lVIarian Brezina has been approved by Jan lVIandel Charbel Farhat Thomas A. lVIanteuffel John \V. Ruge Thomas F. Russell Date __________

PAGE 3

Brezina, lVIarian (Ph.D., Applied lVIathematics) Robust Iterative lVIethods on Unstructured lVIeshes Thesis directed by Professor Jan lVIandel ABSTRACT \Ve propose and analyze three multilevel iterative solvers of both domain de composition and multigrid type. All of these methods are algebraic, allowing almost or fully black-box implementation. Their development was motivated by the need to solve large algebraic systems of equations resulting from finite ele ment discretizations of self-adjoint, second order uniformly elliptic problems on unstructured three-dimensional meshes. Two of the methods discussed perform a simple, but effective domain decomposition as a part of the solving process. This allows for a remarkable adaptivity, where the decomposition is generated depending on the difficulty of the problem without requiring an input of a differ ent decomposition. \Ve focus on achieving robustness features that allow using the new methods as a replacement of direct solvers for solving these systems. The new methods are superior in terms of computational complexity and storage re quirements. On serial architectures, the asymptotic computational complexity of these methods for solving 3D problems is shown to be in the range of O(n716) and O(n49n')). The methods all benefit from implementation on modern par allel architectures which can reduce the computational complexity to O(n716 ) lll

PAGE 4

for all three methods. The theoretical results are accompanied by computa tional experiments confirming the theoretically predicted convergence properties and suggesting the potential of the methods for solving a much wider variety of problems than those covered by the current theory. This abstract accurately represents the content of the candidate's thesis. I recommend its publication. Signed Jan rviandel IV

PAGE 5

DEDI C ATION Dedicat e d to Jitka KHzkova.

PAGE 6

ACKNO\VLEDGErviENTS It is my pleasant duty to acknowledge my gratitude to a number of individuals who have influenced my work. I would like to thank all the faculty and staff at the Center for Computational IVIathematics at University of Colorado at Denver for creating a great environment for research and computing. IVIy thanks go to professors Leo Franca, Tom lVIanteuffel Steve lVIcCormick and Tom Russell, whose classes I had the honor of enjoying. The joint research meetings with the members of the Department of Space Engineering of University of Colorado at Boulder motivated me to focus on the solution of the real world problems; most notably I thank Charbel Farhat for kindly providing the test data from engineering practice. I thank all my collaborators for fruitful cooperation. I want to express my gratitude to Caroline Heberton for carefully reading draft copies of this thesis and proof-reading them. If the reader finds the text enjoyable to read, it is by her merit. lVIy sincere appreciation is due to m y friend Petr for great many conversations, both on and off the topic of numerical methods, and for teaching me how overrated cleaning one's apartment can really be. Last but most I want to thank my advisor, Jan lVIandel, for his guidance, generosity and friendship. His reading courses and informal meetings proved to be both an inspiration and fun. It was he and Petr Van('k, who originally aroused my curiosity and enthusiasm for fully algebraic approach to iterative solvers. I am looking forward to our future cooperation.

PAGE 7

CONTENTS Chapter 1 Introduction and rviodel Problem 1.1 Prologue . . . . . 1.2 Formulation of the Problem and Notation 1.3 Direct Solvers . . . . . . . 1.4 Preconditioned Conjugate Gradient Acceleration 2 Overview of Some Domain Decomposition lVIethods ............ 2.1 Abstract Schwarz lVIethods 2.2 Overlapping Schwarz . 2.3 Non-overlapping Decomposition rviethods 2.4 rviethods Reduced to Interface 2.5 The Neumann-Neumann lVIethod 2.6 EBE method . . . . . 2. 7 BDD as an Algebraic lVIethod 2.8 DD as a 2-level lVIultigrid ... 2.9 Other Domain Decomposition lVIethods 3 Fillly Black-box Overlapping Schwarz lVIethod ............... 3.1 Overlapping Schwarz lVIethod with a Coarse Space. 3.2 Smoothed Aggregation Coarse Space and BOSS . Vll 1 1 6 11 13 24 24 28 30 30 34 39 41 50 52 53 54 68

PAGE 8

3.3 Estimates for Smoothed Aggregation 76 3.4 Practical Issues . . . 83 3.4.1 Generation of Aggregates 84 3.4.2 Nonscalar Problems ... 86 3.4.3 Computational Complexity 87 4 Nonoverlapping lVIethods with Inexact Solvers ............. 90 4.1 Inexact Subdomain Solvers 90 4.2 The Inexact Solvers' Properties 93 4.3 lVIatsokin-Nepomnyaschikh Abstract Theory 95 4.3.1 Abstract Ftamework and Condition Number Estimate 96 4.3.2 An Application: Abstract Additive Schwarz lVIethods 98 4.4 Unextended Hybrid Schwarz Algorithm 101 4.4.1 BDD as a Hybrid Schwarz Algorithm 103 4.4.2 A New Look at the Conditioning of BDD 105 4.5 Extended Hybrid Schwarz Algorithm . . 107 4.5.1 Algorithm on Long Vectors with Exact Components 107 4.5.2 Practical Algorithm on Long Vectors and ACDD . 109 4.5.3 Estimate on Long Vectors for Inexact Neumann Solvers 111 4.5.4 Inexact Coarse Space and Harmonic Extensions 113 4.5.5 Computational Complexity 117 5 Two-Level IVIultigrid Alternative 120 5.1 Alternative For Inexact Solvers 120 5.2 T<:mtative Prolongator and Standard Two-level lVIultigrid 121 Vlll

PAGE 9

5.3 rviodified Two-level lVIultigrid and lVILS 5.4 Practical Issues . 5.4.1 Generalizations 5.4.2 Computational Complexity 6 Numerical Experiments 6.1 l\lodel Problems 6.2 Real \Vorld Problems 7 Conclusions Appendix A Theoretical Results A.1 Friedrichs Inequality B Results Used In Computational Experiments . . . . . B.1 The Stopping Criterion for PCG References IX 126 135 135 136 138 139 148 153 156 156 162 162 165

PAGE 10

FIGURES Figure 2.1 Harmonic extension of a corner peak. . 35 2.2 Harmonic extension of a peak on the edge. 35 2.3 Harmonic extension of a function constant on one edge. 36 2.4 Harmonic extension of a function linear on one edge. . 36 2.5 Harmonic extension of a function with random values on one edge. . . . . . . . . . . . . . . . 37 3.1 Possible assignment of elements to different classes C i in 2D. 72 3.2 Possible assignment of elements to different classes Ci in 3D. 73 4.1 Abstract framework scheme. . . . . . . . . . 96 6.1 The checkerboard coefficient pattern. Dark subdomains correspond to values a1 the light ones to r12 . . . . . . . 145 6.2 The mesh of the automobile wheel (data courtesy of Charbel Farhat, University of Colorado at Boulder). . . . . . 149 6.3 The mesh of the solid with tetrahedron elements (data courtesy of Charbel Farhat, University of Colorado at Boulder). . . 151 6.4 The mesh of the turbine with 123,120 elements (data courtesy of John Abel, Cornell University). . . . . . . . . 152 X

PAGE 11

TABLES Table 6.1 Comparison of BDD with ACDD(k) for different values of kin case of the checkerboard coefficient pattern. . . . . . 140 6.2 Comparison ofBDD with ACDD(k) for problem with exponentially distributed random coefficients. 6.3 Comparison of BDD with ACDD(k) for problem with uniformly distributed random coefficients. . 6.4 Comparison of BDD with BOSS and lVILS for problem with coefficients jumps in a checkerboard pattern form e d by 125 subdomains. Coarse spaces of dimensions 2744 and 125 were used for rviLS and BOSS. Prolongation smoother of degree 1 142 143 was used, and rviLS used 2 pre-smoothers, 2 posts moothers. 144 6.5 Comparison of BDD with BOSS and lVILS for problem with uniformly distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 1 was used, and lVILS used 2 pre-smoothers, 2 post-smoothers. . . . 145 6.6 Comparison of BDD with BOSS and lVILS for problem with uniformly distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 4, and 4 presmoothers and 4 post-smoothers were used in rviLS. XI 146

PAGE 12

6. 7 Comparison of BDD with BOSS and lVILS for problem with exponentially distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 1 was used for both BOSS and lVILS, and 2 pre-smoothers and 2 postsmoothers in rviLS .. 6.8 Comparison of BDD with BOSS and lVILS for problem with exponentially distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 4 was used 147 for BOSS and lVILS, and 4 pre-and postsmoothers in lVILS. . 14 7 6.9 Comparison of BDD with BOSS and lVILS for problem with uniforml y distributed random coefficients and coarse space of dimension 2, 744 degree 1 of prolongator smoother, and 2 pre-and postsmoothers in lVILS. . . . . . . . . . . 14 7 6.10 Comparison of BDD with BOSS and lVILS for problem with exponentially distributed random coefficients and coarse space of dimension 2, 7 44, Prolonation smoother of degr e e 1 was used and 2 pres moothers and 2 post-smoothers in lVILS. 6.11 Comparison of BOSS and lVILS for solving the shell problem on a mesh discretizing an automobile wheel with 9,915 nodes and 59,490 degrees of freedom. Prolongator smoother of degree 1 148 was used, and 4 pre-smoothers and 4 post-smoothers in rviLS. 149 Xll

PAGE 13

6.12 Comparison of BOSS and lVILS for solving the 3D elasticity problem with 25,058 nodes and 75,174 degrees of freedom, Pro longator smoother of degree 1 was used and 2 pre-smoothers 2 post-smoothersin lVILS. . . . . . . . . . . 150 6.13 Comparison of BOSS and lVILS for solving a shell problem: a propeller with 41,040 nodes and 123,120 degrees of freedom. Prolongator smoother of degree 1 was used and 4 pre smoothers, 4 post-smoothers in lVILS. . . . . . . . 151 Xlll

PAGE 14

"The sole aim of science is the honor of the human mind, and from this point of view a que s tion about number s is as important as a qu e stion about the s ystem of the world." C. G. J. Jacobi

PAGE 15

1. Introduction and Model Problem 1.1 Prologue After discretization, many problems of engineering practice reduce to the numerical solution of linear systems. These systems are typically very large, sparse unstructured and ill-conditioned. The performance of linear solvers for these discretized problems in terms of computational complexity is usually ap proximated by K,nf:J, where n is the number of degrees of freedom and K,, ,B are constants dependent on the particular choice of solution method. The value of K, is smaller for direct solvers than for iterative ones which makes them less costly for small to medium size problems. Filrthermore, it is typ ically much less sensitive with respect to the conditioning of the problem. This robustness feature is the chief advantage direct solvers provide. Unfortunately, the exponent H is significantly larger for direct solvers, which makes their appli cation prohibitively expensive for large problems. Further, the lack of structure in the problems makes efficient and versatile implementation of direct solvers on modern vector and parallel computers troublesome. Another problematic issue concerning direct solvers is the amount of storage needed to carry out the nec essary computations, as even the most sophisticated direct methods cannot, as a matter of principle avoid fill-in in computing factorization of a matrix. The ever increasing demand to analyze very large finite element sys tems together with the drawbacks of direct solvers suggests considering iterative 1

PAGE 16

solvers as an alternative for solving today's and future problems of practical interest. In order to design an iterative method as a worthy replacement of ex isting state of the art direct solvers for solution of the large problems the issue of robustness has to be resolved. In order to achieve robustness features similar to those of direct methods, it is necessary to find ways of easing the dependence of their constant K on the conditioning of the original problem. A common way to tackle this problem is an application of a conjugate gradient method with a properly chosen preconditioner. For a large class of problems, twoor multilevel methods seem to be the most appropriate choice for a preconditioner because of their convergence and computational complexity properties. It is desirable for commercial solver packages to be able to generate coarse discretizations with the property that smooth error components on a given discretization are well ap proximated by a consecutive coarser level space. In addition, the process should, if possible, be completely automatic, without dependence on explicit knowledge of the underlying finite element mesh. The solvers should be able to rely on as little information about the problem as possible (in extreme case, only its dis crete representation), while being able to exploit any additional information that might be available. Despite the limited input, they must be able to efficiently treat practical problems. The idea of employing a problem of reduced dimension related to the problem to be solved had been known on both discrete and continuous levels long before the advent of systematic studies of multilevel methods. It appeared in the works on the problems of economic modeling by Leontief in 1930's in a context closely related to the aggregation technique we will use in Section 3.2. 2

PAGE 17

The idea of nested iterations related to multigrid method can be traced back to Southwell in 1940's. On the continuous level, the idea is known to have been employed in solving the problems of reactor physics by Enrico H : Tmi in early 1940's and later by lVIorozov [70] and others. Despite the w ealth of the multilevel literature up until recently the known multilevel methods could not guarantee the above requirements of gen erality and mesh-independence of the method for general nonuniform meshes. For these reason s, algebraic multilevel iterative techniques are very attractive. The two-level domain decomposition offers on e solution, mixing direct solvers at subdomain level with an iterative method on the interface level. The localization of the data in these methods makes them very suitable for modern parallel com puters. Another candidate is the class of algebraic multigrid methods. These are usuall y harder to parallelize but their converg e nce properties and possibl y optimal computational complexity are very favorable. The construction of these algebraic multigrid (ArviG) and domain decomposition (DD) methods has been the focus of interest in the last years, as to which the growing list of references dealing with the topic attests (e.g., the pion e ering works leading to ArviG of the present day by Brandt, lVIcCormick, Ruge and Sti1ben [ 16, 77, 78, 82 ] and the more recent papers by l\!Jandel and Brezina [ 87, 86].) Our goal will be to design and study such algebraic methods. Focusing on the class of problems arising from the finit e element discretization of e llip tic partial differential equations, we propose several iterative solvers. Although thes e are efficient methods in their own right, we will view them mainl y as pre conditioners for the preconditioned conjugate gradient method. 3

PAGE 18

The work is organized in 7 chapters. In the rest of Chapter 1 we intro duce a model problem and some notation used throughout this thesis. \Ve also recall the preconditioned conjugate gradient method as a means of efficient solu tion of large sparse systems. Chapter 2 gives an overview of some existing state of the art domain decomposition methods which bear relevance to the methods we will devise. Here we give comments and examples relevant to our main goal of developing robust methods for problems on unstructured meshes. In Chapter 3 we design an overlapping domain decomposition preconditioner with a new coarse space. \Ve will call the new method BOSS. \Ve prove optimal convergence properties of BOSS under regularity-free assumptions on the problem and very weak assumptions on the finite element discretization mesh. The main advantage of this method is its applicability to problems discretized on a variety of types of finite elements and on unstructured meshes. The method also resolves the setup difficulties usuall y associated with overlapping domain decomposition methods. The overlapping subdomains are generated inside the solver b y simple algebraic means. Both the setup and the iteration itself are purely algebraic. One re markable quality of the method is its adaptivity; for more difficult problems, we use more extensive overlapping achieved without ever partitioning the domain into subdomains outside the solver. The asymptotic computational complexity of the method implemented on a serial architecture in the optimal case is and O(n49/J) in 2D and 3D, 4

PAGE 19

respectively. In additive setup, the method is suitable for application of inexact subdomain solvers. Chapter 4 discusses the application of inexact components in substructuring methods. The issues of the influence of approximate subdomain solvers as well as issues of approximate formulation of the method and enhanc ing the sparsity of the coarse level problem are studied here. Using the resulting method as a preconditioner in the conjugate gradient method, the condition num ber is proved to be bounded by C(l + log(H/h))2 with constant C dependent on the quality of the approximation used. Estimates for C are given. Chapter 5 describes an algebraic multilevel technique using smoothed transfer operators. The resulting method, called lVILS, is based on the same concepts as the one of Chapter 3, but no subdomain problems are solved. Instead, a combination of smoothing and coarse space correction reminiscent of the multigrid V -cycle takes place. This leads to a significantly reduced computational complexity of O(n716 ) and O(n615 ) for a serial implementation in 2D and 3D respectively as opposed to O(n4n ) and O(n49n')) in the case of BOSS. Optimal convergence estimate is also given. Chapter 6 presents the results of numerical experiments. The robustness of these methods with respect to a number of aspects has been tested by solving both artificial and real-world problems. The comparison of the results suggests BOSS to be the most robust of the three methods with respect to a number of influences, including discontinuities in coefficients of arbitrary pattern and complex geometries. Finally, Chapter 7 gives a brief summary and suggests the directions of the future implementation improvements and research. 5

PAGE 20

1.2 Formulation of the Problem and Notation Let n be a bounded domain in IRd, dE {2, 3}, with a piecewise smooth boundary an, and an = fv urN with fv,Cv disjoint, meas(fv) > o. \Ve will denote by r f), r N the parts of the boundary with Dirichlet and Neumann boundary conditions, respectively. A coarse triangulation is needed in some of the methods we will discuss. This can be achieved by decomposing n into nonoverlapping subdomains ni, i = 1, ... J, (1.1) such that ni does not cut through any element. Other forms of decomposition will also be discussed in the text whenever appropriate. Consider the model problem where Lu = f in n, U = g on r [), [Ju n = 0 on r!Y, un Lv = t. _2__ (o:(x),Hr"(x)av (x)), r,8=l axr ax" (1.2) (1.3) with the coefficient matrix (,Bn) uniformly positive definite bounded and piecewise smooth on n, smooth on each ni, and o:(x) a positive constant in each subdomain ni, i.e. rx(x) = (li > 0 for X E ni. Variation in the value of n(x) is allowed across substructure interfaces. It is possible to relax this assumption to the case when o:(x) varies moderately within each subdomain. Problem (1.2) can be restated in terms of an equivalent variational Galerkin problem u E V: a(u, v) = f(v) Vv E V, (1.4) 6

PAGE 21

where V = denotes the Sobolev space (cf. [ 72, 1]) of H1(D) functions vanishing on fvcDD, meas(fv) > C meas(DD). The finite element discretization reduces problem (1.4) to a linear systern of algebraic equations Ax= f (1.5) where A is the stiffness matrix with entries and fi = (!, is the load vector; are the standard finite element functions. \Ve assume the finite element space to be the usual conforming P1 or Q1 space [23]. \Ve define the subdomain decomposition interface J r = u ani\ rv (1.6) i = l and, for simplicity, assume that r v is the union of the closure of whole faces of some or all of the boundary substructures. \Ve associate parameters H, h with the coarse and local mesh, respectively. \Ve assume that the elements and subdomains are shape regular in the usual sense [23] It is easy to see that under our assumptions the bilinear form a(, ) sa tisfies the usual assumptions of symmetry, V-ellipticity and boundedness (cf., [23]). (1.7) \Ve note that these restrictions are assumed in order to simplify the analysis of the methods. Some of the methods discussed in the text perform well if some or even all of the aforementioned assumptions are violated. Solving problem (1.4) under these assumptions should be viewed as as our minimal 7

PAGE 22

goal; indeed, some of the discussed methods will be applicable to solving much more general problems with weaker assumptions. 'Whenever appropriate, these assumptions will be relaxed. Let 0 denote a reference domain of diameter 0(1) (e.g., square or cube in 2D or 3D, respectively) and assume that the subdomains Di are of diameter O(H) and shape regular, i.e., (1.8) with DFi the Jacobian and I I II the Euclidean IRd matrix norm. Let Vn (D) be a conforming linear finite element space on a quasi uniform triangulation Tn. of n with a characteristic meshsize h, such that each subdomain Di is the union of some of the elements, and the usual shape regularity and inverse assumption hold ( cf, [ 23]). All functions v E Vn (D) satisfy homogeneous boundary condition U = () on f D Let Vn(Di) be the space of the restrictions of functions in Vn(D) to D i Throughout this thesis C and c shall denote (unless specified otherwise) generic positive constants independent of the shape or size of D and Di. Note that these constants may depend on the constant in ( 1.8) or on the regularity of the triangulation, but they are independent of h and H. Following [11], [28] and [ 81], we define the scaled Sobolev norms 2 2 1 2 l l ullt,n; = l ult,r!; + H 2 1 u lo,n,, 2 2 1 2 llullt/2,<'1!1; = l u lt/2,<'1!1; + H l u lo,l1!1;' where 8

PAGE 23

The advantage of this definition is that it allows us to restrict all of our considerations to the reference domain 0 and use the mappings Fi to obtain the results for each ni from the obvious norm equivalence II' 1 12 I I 112 d-2 Cll 112 c u I,n, ::::: u o F i I ,l'zH ::::: -' u I,rz, (1.9) II' 1 12 I I 112 d-2 Cll 1 12 c u 1/2/l!l ; ::::: u Fi IJ2,or1H ::::: -' u 1j2,on; Assume that for each n i r D n ani is either empty or a part of ani of size bounded below by a fixed proportion of the size of ani so that the inequality (Theorem A.1.4) holds uniformly for all ni, with the constant C in dependent of h and H, 1 '2 l u l o ,on, ::::: CH 1 l u h / 2/J!l ; (1.10) for all u E v'h.(ni ) if fD n ani :f 0 and for all u E v'h(ni), .fm, u ds = 0 if r/) n ani= 0. Throughout this thesis, we will adopt the following notation: \Ve denote by o-( A) the set of all eigenvalues of A by g ( A) the spectral radius of matrix A: e(A) = max{I A(A) I }. \Ve shall use (-, -), II I I to denote the Euclidean inner product and Euclidean norm, respectively. If B is a symmetric positive (semi)definite matrix, we also use the (semi)norm derived from this matrix, i.e. 1/2 2 llxi! B=(x,x)u = \Bx,x)1 1 For a matrix A positive definite and symmetric in B-inner product, the s ymbol condu(A) will denote the condition number, condu(A) = where minimum is taken over all positive A1W A i\4 such that Am ::::: r4x ,x\B ::::: -'\..'1x,x 8 AM Vx E lRn. F<)r A = AT we obtain the usual spectral condition number cond(A) = I I AII2IIA-1II2 = where Amax and Amin denote the largest and tnln .: 9

PAGE 24

smallest eigenvalues of A, respectively. Note that for linear systems with matrix A resulting from finite element discretization of second order partial differential equations, the condition number is proportional to 1112 (cf., Lemma A.1.7). The definition can also be extended to a product of two B-symmetric positive definite operators i.e. the mutual condition number of a preconditioned system with preconditioner JVi, condu(JVi, A). In this case finding positive constants A1w ..:\.1 H such that Vx E IRn (1.11) provides an estimate and the mutual condition number is defined as the minimum of 0.M_ over all Am Am,AM satisfying (1.11). Adopting this definition, we will denote by abuse of notation = condu(JVi,A). This follows from the identity A) = and from the fact that Rayleigh quotients ((A.x, x)t }vlx x n ( l l ) and .'vi -2 ('l.!vt)-zy,y 8 hav e the same extrema. y y ll Symbol A (i) will denote the local stiffness matrix corresponding to subdomain n i x(i) the vector of degrees of freedom corresponding to all the elements in ni and N i the matrix with entries 0 or 1 mapping the degrees of freedom x(i) into global degrees of freedom, i.e., x(i) = N[x. Each x(i) may be split into degrees of freedom ;r(i) corresponding to ani n r called interface degrees of freedom and the remaining interior degrees of freedom j;(i) (thus the degrees of freedom on an n f N are considered interior degrees of freedom.) \Ve will often find it useful to split local s ubdomain matrices 10

PAGE 25

A (i) and the matrices Ni accordingl y : [ Ali) j = 11 1 2 1' I Atz Azz (1.12) where AW, are the blocks corresponding to the interface and interior degrees of freedom, respectively. Similarly, let N and 1 V denote the matrices with entries 0 or 1 mapping the degrees of freedom X on f into global degrees of freedom, i.e., X = f\:Tx and between the degrees of freedom in n \ r and the global degrees of freedom, respectively. 1.3 Direct Solvers In this section, we briefl y remark on some direct methods of solution of linear algebraic systems commonly used in practice. Although the iterative methods had actively been used in finite element computations well into 1960's, the availability of new hardware with larger storage made it possible to use direct solvers. For the last three decades, the majority of commercial finite ele-ment codes have relied on direct methods to solve linear systems of equations. lVIost of today's direct solvers proceed by first factoring the matrix A (e.g. by Cholesky or Crout factorization) and then using back-substitution. The simplicity and robustness of the direct methods have gained them acceptance in the two-dimensional finite element analysis. However, their application to solving practical three-dimensional problems of even relatively modest sizes poses a challenge for the in-core storage capacity and execution capabilities of current 11

PAGE 26

high-end computers. For the sake of argument let us consider two model prob lems: a square consisting of n x n Q1 elements and a cube of n x n x n Q1 elements. In order to solve a simple diffusion problem on our 2D model mesh, we will need to perform O(n4 ) floating point operations to obtain the factoriza tion and the storage required to carry out the computation will be words. The demands grow severely higher for the 3D model mesh, where the require ments are O(n7 ) and O(n5 ) for the asymptotic cost of factorization and storage, respectively. The asymptotic cost of back-substitution is negligible compared to the cost of the factorization, as it is and O(n5 ) for the 2D and 3D model mesh, respectively. There are direct methods requiring less work than the profile method discussed above. Optimal in terms of computational complexity is the method of nested dissection [41] requiring O(n'3 ) and O(n6 ) operations for our example in 2D and 3D respectively. Although this is less than in the case of the lexicographically ordered Cholesky factorization, it is still too expensive. Of course, our model meshes are among those least suitable for direct solvers. Indeed, there exist 3D meshes for which application of direct solvers is more than appropriate. Fi>r instance, a single strip of 3D elements may be quite easily solved by a direct solver if the degrees of freedom are properly ( re )numbered. It is intuitively clear that efficiency of direct solvers depends on the connectivity of the mesh. In order to evaluate suitability of applica tion of direct solvers Hughes Ferencz and Hallquist developed the concept of fractal dimension allowing to measure mesh connectivity (cf. [49] for details). The larger the fractal dimension of the mesh, the more beneficial application of iterative methods may be. An alternative measure allowing insight into the role 12

PAGE 27

of mesh connectivity and its influence on the performance of direct solvers is the Euclidean norm of the lengths of the skyline profiles. In this thesis we focus on iterative methods as a superior replacement of direct solvers for problems on large unstructured 3D meshes with high fractal dimension where their superiority over direct solvers can be most easily achieved. But it is the experience of the author that many of these methods prove to be worth y rivals of direct solvers even for problems with rather small fractal dimensions and in 2D. For a comprehensive study of a variety of direct methods of solution, appropriate data structures and other r elated topic we refer the reader to [42]. 1.4 Preconditioned Conjugate Gradient Acceleration This section provide s an overview of the conjugate gradient method introduced by Hestenes and Stiefel [ 46]. Although a well-known method, we find the PCG method to be of such a paramount importance that we will describe it in detail. The treatment here follows mostly the exposition of Ashby, rvianteuffel and Saylor [ 3], allowing more generality than traditional references. Also, it is the opinion of the author that this point of view is more transparent than alternative ones. Other useful di s cussions appear, among others, in Luenberger [ 60], Golub and O'Leary [24] and Golub, Van Loan [ 44]. Let A : V --+ V be a SPD operator with respect to the inner product \ ). \Ve need to solve the equation of the following type: Au= b. ( 1.13) Let n dim( V). The conjugate gradient method belongs to the family of 13

PAGE 28

polynomial iterative methods of the form 1 ui+I = ui + L 1JiJr;, j = O where ri denotes the residual in step i r i = b-Aui. From here, denoting b y u* the exact solution, we easily obtain the equation for the error ei = 'U* -'Ui where Ri +I denotes the residual polynomial (Ri+I (0) = 1) of degree less or equal to i + 1. Because Ri+I (0) = 1, w e have uo +(IRi +t (A))eo Let us denote by K i (r0 A) the Krylov subspace of dimension at most i generated b y r0 and A. That is T hus we can rewrite the last equation as (1.14) or equivalentl y (1.15) 14

PAGE 29

Equations (1.14), (1.15) define a general gradient method. Let B be a Hermitian positive definite matrix. \Ve define CG(B,A) to be the gradient method that chooses so that the norm llek+tii B is minimized over Kk +1(r0,A): (1.16) This is the optimality property of the method. Different choices of matrix B, result in different methods. Recall that Kk C Kk+I Thus for all w E Kk we have 0 (Bek+t,w) (B(ek-dk),w) (Beb w)-(Bdkl w) -(Bdklw), as (Be b w) = 0 from previous step. Therefore, In other words This defines dk up to a scalar. Thus, we can let 15

PAGE 30

where Pk E Kk+I is some con v eniently chosen vector and scalar O:k is yet to be determined. To find dk we will construct a B-orthogonal basis {PJ }J= o for Kk+I \Ve can use a generalization of the Gram-Schmidt process to the B inner product to do this: B y construction and so Po= ro i P i+t = Api 2::: O"iJPJ, j = O O"ij = (BApi,PJ) ( BpJ ,PJ) for some O:j \Ve can once again use the orthogonality ej+I B KJ+1 to determine and therefore (B(eJ-O:JPJ),pJ) (BeJ,P J ) C lJ(BpJ,P J), This valu e is always well-defined because B is assumed Hermitian positiv e definite. Note, however, that this value is expressed in term s of the unknown error ej, which raises the question of it s computability. In order to compute o:i, we will need additional assumptions on B. FN our purposes it will suffice to assume 16

PAGE 31

that B = p(A) where p(x) is an arbitrary polynomial with zero absolute term, i.e., p(O) = 0. Thus in practice, vector d j satisfying the optimality property ( 1.16) is obtained by imposing the equivalent B-orthogonality condition (1.17) The above discussion defines the conjugate gradient method CG(B,A). Different implementations of the method exist. These different algorithms are mathematically equivalent for a symm e tric and positive definit e matrix A. Also by different choice of matrix B, we recover various known methods (setting B = A, for instance y ields the original method of Hestenes and Stiefel). For a systematic study we refer to Ashby, rvianteuffel and Saylor [ 3 ] As we will mostly be dealing with symmetric positive definite problems, we only recall here the so-called Orthomin implementation because of its superior computational complexity and storage requirements. Algorithm 1 (Orthomin). For a given initial guess u0 E IRn, set ro = b -Auo Po = ro, and repeat 1 A (Be;,p;) (Xi= +-_:_:_:___:+ (Hp ; p ;)' 2. ui+t = ui + itiPi 3. = ri -i"ii Afii 4 ). ( BA.e;+l ,pj) l J -(Bpj,Pj ) 5. Pi+ I = -L; = O .BijPj ; until convergence; 17

PAGE 32

This implementation of conjugate gradient method will work for any matrix A provided all the coefficients O' i f 0. Algorithm 1 in general requires storing all previous direction vectors Pj in order to compute .Hij Under our assumption that A be symmetric positive definite, however, the recursion in Step 4. naturall y truncates so that onl y the last direction vector ne e ds to b e stored. The necessary and sufficient conditions of existence of this and other economical recursions in conjugate gradient method were studied in Faber and l VIanteuffel [35]. The advantage of looking at the conjugate gradient method in the way described above is that it simplifies the analysis of the error reduction. No te that owing to (1.15) and (1.17) we may view each step of the conjugate gradient method as eliminating the B-orthogonal Galerkin projection of the unknown error onto the current Krylov space given b y r0 and A. Since K n(ro,A) =Vanden B Kn(ro,A), we have en= 0 assuming infinitely precise arithmetic. In other words CG(B,A) converges in at most n steps. This is how the conjugate gradient method was originall y perceived. Computational experiments have shown that the round-off error can cause loss of orthogonalit y in practical applications. That is why today the method is primarily viewed as an iterative rather than direct one as first suggested by Reid [75] It is oft e n useful to replace the problem Au = b by a related problem .:Vi-1 A u = .:vt-1 b, where .:Vi is a symmetric positive definite matrix. l\rJatrix .:Vi is called a preconditioner. Matrix .:vt-1 A is no longer symmetric in the (-,-) inner product, but we note that it is symmetric with respect to the energetic inner 18

PAGE 33

product (A,-) and with respect to inn e r product ) Therefore, we can now apply the conjugate gradient method to the matrix il = A and b = b. B y doing so, we arrive at the preconditioned conjugate gradient method (PCG). T he following is the Orthomin implementation of PCG: Algorithm 2 (Orthomin implementation of PCG). For the given initial guess 'Uo E mn, set ro = b -Auo, = ro, repeat 1 A \Be;,p;) (Xi = J ) \Bp;,p; 2. 'Ui+1 = 'Ui + iXiPi, until convergence; Note that the computational complexity of this algorithm differs from the Orthomin algorithm without preconditioning by the cost of the additional preconditioning step 4. The following theorem offers a motivation for employing a preconditioner. For simplicity, w e will restrict the proof to the case of the preconditioned conjugate gradient method of Golub and O'Leary [ 24 ] i.e., we set B =A. Theorem 1.4.1. For error in step k of the conjugate gradient method applied to syste m -1 Au = -1 b, it holds that k -1) llu-::; 2 V . llu-uoiiA + 1 ( 1.18) 19

PAGE 34

Proof. Let us set r0 JVi-1 (b Au0). As 'Uk is a A-orthogonal projection, we have Now for an arbitrary polynomial Pk-1 of degree k -1, taking we have \A(uuk), u-uk) < min(A(J-JVi1 APk-1(JVi1A))(uuo),(I JVi1APk-1(JVi1 A))(a uo)) Pkl < min \Aqk(JVi1 A)(uu0), qk(JVi1 A)(uu0)) qk(0)=1 < min max !qk(A)!2\A(u-u0), u -uo). q k (0) = 1 ..\EO"(.A-1 1A) It is a well-known result of approximation theory that the solution of this minimax problem can be found in terms of the Chebyshev polynomials scaled to the interval of the spectrum [Amin(JVi-1 A), Amax(Jvt-1 A) ] \Ve can then deduce that from where (1.18) follows. D By setting B = A, JVi = I we recover the error estimate for the original conjugate gradients method of Hestenes and Stiefel. 20

PAGE 35

\Ve notice that this estimate for the convergence rate of the conjugate gradient method depends strongly on the condition number. Namely, the num ber of steps required to reduce the error by a given factor is proportional to ylcond(.:Vi, A). Even though the estimate (1.18) often turns out to be rather pessimistic (e.g., if the matrix A has clustered eigenvalues), we can see that the selection of the preconditioner .:Vi is crucial to the performance of the method. Theorem 1.4.1 thus gives an answer to two questions: It shows that the addi tional cost of preconditioning can be justified and it clarifies the sense in which the preconditioner .:Vi should be related to A if good convergence is to be guar anteed. Based on these facts, we will seek preconditioners such that the problem .:Viz = r is more easily obtained than that of Au = b. This will guarantee that the cost of applying each iterative step of (PCG) will be small. \Ve will want .:Vi to approximate spectral properties of A in the sense that cond(.:vt-1 A) is small. This will guarantee that the number of steps necessary to reduce the error to a required size will be small. The quantity cond(.:Vi, A) = cond(.:Vi-1 A) will serve as a measure of quality of a preconditioner. \Ve have shown that preconditioning can greatly improve the conver gence properties of the conjugate gradient method. Our work will consist in finding suitable preconditioners. Sometimes these preconditioners are explicitly available in the matrix form, but most often, we will find it convenient to con struct iterative methods to compute approximation of A -1 rk needed in step 4. of Algorithm 2. These iterative methods, if used as a preconditioner, improve 21

PAGE 36

the convergence rate of PCG. \Ve can also adopt a dual understanding of preconditioning with respect to the linear symmetric iterative methods, namely that PCG is often viewed as a means of acceleration of these methods. In order to justify this claim, let us consider a linear iterative solver in the form (1.19) consistent with the problem Au = b, i.e., having the same solution as Au = b. -Here N is assumed symmetric, hence 1\1 is A-symmetric, because it follows from the consistency and the fact that A is SPD that A-1b = J VJA-1b + Nb Vb, so --1\1 =I-NA and (A1Hx, y) = (A(INA)x, y) =(Ax, (INA)y) =(Ax, 1\iy). Note that all the classical splitting type iterative schemes such as Jacobi, as well as the variational multigrid methods may be written in this form (see Section 4.2 for more details). The consistency condition allows us to write alternatively hence the method may be viewed as preconditioned Richardson iteration with preconditioner .:vt-1 = N. This process converges provided that This means q = I I I N Ali A < 1. Since we can easily verify that 1+q cond(NA)::; --, 1 q 22

PAGE 37

we can see that application of preconditioned conjugate gradient method with preconditioner N (in other words, the preconditioner given by one iteration of the process (1.19) with x0 = 0) yields a superior convergence factor, as Jcond(N'A) 1 Jcond(NA) + 1 1J1q2 <
PAGE 38

2. Overview of Some Domain Decomposition Methods In this chapter, we will give an overview of some known domain decom position methods. Because the domain decomposition field has grown very large since the first studies this overview cannot be exhaustive. Instead we list a few methods we will draw on in the derivation of our new methods. Some methods are presented with the intention to contrast the differenc e s with our approach. \Ve also attempt to direct the reader to the appropriate sources dealing with related topics not covered in this thesis. 2.1 Abstract Schwarz Methods In this section, we outline the theory of Schwarz methods a useful tool for analyzing man y domain decomposition methods. In recent years a lot of effort has been put into a systematic study of various types of the domain decomposition methods, generalizations of the alternating method of Schwarz [80]. Some of the pioneers of this work include, among others, and lVIandel [5] Bramble Pasciak and Schatz [10], Dryja, Proskurowski and \Vidlund [29] lVIatsokin and Nepomnyaschikh [71] The Schwarz methods form a class of iterative methods for solving problems arising from discretization of partial differential equations. These methods can 24

PAGE 39

be fully described by dividing the original solution space V of (1.4) into sub spaces Vic (Di), such that Vv E V: v = v1 + v2 + ... + VJ, wherev i E Vj, (2.1) and defining bilinear forms ai(., .) on Vi x Vj. These forms are assumed coercive and bounded on vi with respect to a(, )-norm. These assumptions will be rigorously formulated in Theorem 2.1.1. Note that the decomposition in (2.1) may not be always unique. The symbol denotes the so called coarse space, whose main purpose is to expediate the global correction of the error. Sometimes this space fulfills other roles, as we will see in Section 2. 7. The solution in Schwarz methods is iteratively corrected by adding the approximaton of the current error on subspaces V j To this end we define operators 7i : V ---7 V i as a i (Titv, v) = a(tu, v) Vv E Vj, i = 0 ... J. (2.2) The operators 7i are thus defined uniquely up to the poss ible shift in the kernel of ai(-, ). All the methods formulated in the text will be indep e ndent of this shift. Note that if we set a i(u, v) = a(u, v), Vu, v E Vi, 7i is an a(., .)-orthogonal projection onto Vj. This is why we refer to 7i as to generalized projections. Using the operators 7;, a variety of methods may be constructed. They differ in several aspects. The subspac e s v i may correspond to overlapping or nonoverlapping subdomains DieD allowing unique or non-unique representation of a function u E V r e sp e ctively. Schwarz methods can naturally be classified as multiplicative or addi tive, based on the manner in which the correction of the error is done. The 25

PAGE 40

former is done following lines similar to the Gauss-Seidel method, while the lat-ter is reminiscent of the Jacobi method ( cf. [ 94, 55]). In either case, we can take the benefit of the fact that the (a pproximate) projection of the error 'U -u can be attained without the knowledge of the exact solution u*, using the definition of 7i: ai(Ti(u*u), v) = a(u*u, v) = f(v)a(u, v), Vv E V j (2.3) The multiplicative Schwarz method starts from u = 0 and proceeds by updating the current approximate solution 'U+-u -'Ui, where ui = 7i (u -u) E Vi, i = 0, ... J is computed from (2.3). This process results in a nonsymmetric operator. If the method is to be used as a preconditioner in PCG, it is desirable to work with s y mmetric operator instead in which case one can apply the updates once in forward and once in backward order. The additive Schwarz method is defined as J J uk+t = 'Uk + 'UJ = uk + 7 L 1j('U* -,uk), (2.4) j = O j = O where 7 is an additional correction parameter. "With the operator C defined by J CA = LTi = T, (2.5) i = O we can rewrite the method (2.4) as (2.6) Hence, we may view operator Cas an approximate solver to Au= f, defined by J J c f = L 'Ui = L Tj'U, Ui E Vi' i = O i = O 26

PAGE 41

where Application of (2.6) by itself may be fruitful if 7 is chosen so that e ( TCA) ::; 2. A more common approach, however is to use Cas a preconditioner in the conjugate gradient method for problem (1.4). Then we obtain = (J(CA) = (J(T), i.e. the convergence properies of the method may be studied as the spectrum of the sum of the approximate projections Tj. This path has been taken by and lVIandel [5], Dryja, Smith and \Vidlund in [30], [32] and in a different setup by Xu [94]. The following theorem summarizes the results of their research. Theorem 2.1.1 (Dryja, Widlund [32]). Let T = 'LL1 7i and ( i) there exists a constant C0 such that for all u E V there exists a decomposition U = J.:-/= o Ui, 'Ui E vj, such that J """" 2 L ai(u i, u i ) ::; C0a(u, u) i = O (ii) there exists a constant w such that for i = 0, ... J (iii) there exist constants sih i, .i = 1, ... J, such that Then, T is invertible and C02a(u, u)::; a(Tu, u) ::; (g(<:) + l)wa(u, u) VuE V. (2.7) 27

PAGE 42

Here e( t) denotes the spectral radius of the matrix t = {Eij }L = t Proof. The theorem will be proved as an application of the more general framework of Section 4.3.1. Another approach may be found in [94] 0 Remark 2.1.2. Assumption (i) of Theorem 2.1.1 by itself guarantees that inf O"(T) C02 This result is known as the generalized Lions' lemma. \Ve will need it in its original form in Section 3.1, where it is stated as Lemma 3.1.8. The scale of methods is not limited to additive and multiplicative; one may combine them together. For instance Cai [18] proposes to apply the mul tiplicative approach to the local solves (i = 1, ... J), while treating the space in additive fashion with respect to V1 vj. This allows solving the local problems in parallel with the coarse space problem. The local problem solving takes advantage of the more rapid convergence of multiplicative methods. Such an approach, however, inhibits parallelization of the local solves. Another pos sibility ( cf. lVIandel [ 64]) is to treat the local problems in additive way while a coarse problem of modest size is added in a multiplicative fashion. This approach seems preferable because all the local problems may be solved in parallel and the multiplicative coarse grid update is more efficient than an additive one. 2.2 Overlapping Schwarz Possibly the most studied of domain decomposition type methods is the overlapping domain decomposition method ( [ 80 ], [59], [31], [27]). In this method, the original domain n is covered by a collection of overlapping subdomains. The additive or multiplicative update of the solution described in previous section are then applied. Using the definition of (generalized)projectors 7;, the error 28

PAGE 43

propagation operator of the multiplicative method can be written as (ITo)(I-T;)(I-0-t) (I-Tt), and the error propagation operator corresponding to the additive one is J I 7 "'E;Ji. i = O Overlapping methods with a coarse space usually offer an improvement over the optimal non-overlapping methods in the sense that in the overlapping case with a coarse space the condition number can typically be bounded ( cf. [ 33]) as H cond(.:\1-1A) :::; C(l + 8 ), where S denotes the size of overlap of the subdomains. This better estimate comes at a price, though as for three dimensional problems even a small overlap amounts to a significant increase in size of the local problems. This fact is especially pronounced if the subdomains are small. Another disadvantage of these methods is that their setup is usually not very flexible. Typically when generating a system of overlapping subdomains, one starts from a system of nonoverlapping subdomains and adds new layers of elements around each of them. This is a cumbersome procedure and also one of the reasons why substructuring methods seem to enjoy more popularity today. Yet another issue troubling practical usability of overlapping Schwarz methods is the construction of a coarse space. For unstructured meshes it is difficult or impossible using traditional geometric approach to construct an automatic procedure yielding a coarse space that would be a subspace of the original finite element mesh; this presents a technical difficulty of having to deal with nonnested spaces ( cf. [ 21, 20] .) 29

PAGE 44

The advantage of overlapping additive Schwarz methods is that they easily lend themself to application of inexact subdomain solvers. In Chapter 3 we will design and analyze an efficient black-box over lapping Schwarz method with a new coarse space avoiding the pitfalls of the traditional approach. 2.3 Non-overlapping Decomposition Methods As noted in the previous se c tion, the application of the traditional over lapping domain decomposition methods requires an undesirabl y complex me s h partitioning stage if the desired measure of overlapping is to be guaranteed. It is diffi cult to design automatic partitioning of complex geometries unl ess an algebraic concept similar to one described in Chapter 3 is used. Nonoverlapping techniques simplify the partitioning stage significantly, being able to u s e a variet y of existing aggregation or gre edy algorithms [37 38]. 2.4 Methods Reduced to Interface lVIany of the substructuring methods are bas ed on reducing the original problem Ax= b to a problem defined only on the in t erface r common to at least two substructures by eliminating all degrees of freedom not associated with any subdomain inter face. The initial problem is thus reduced to one whose unknowns are only the interface degree s of freedom. A s the interior degrees of freedom corresponding to different subdomains have no coupling in the stiffness matrix A the elimination can be done locally on each subdomain and in parallel. 30

PAGE 45

Let A (i) denote the stiffness matrix of the bilinear form ai ( ) defined on V X V which is the restriction of the bilinear form a(,) to subdomain Ui, a(u, v) = 'f:./=1 ai (u, v). For our model problem given by operator (1.3) this is the stiffness matrix with entries Ak(it) = Ld _1 }' 1 .,. dx, where T,S. L, U X s U X-r nodes vk, v 1 lie in n i The stiffness matrix A can be obtained by the standard process of subassembly J .j = """"' "(\! .. :l (i) "(\!:} _..rt_ 1\1.._,. J.\'l i = 1 The matrices S(i) resulting from the elimination of the interior degrees of freedom of A(i) are the local Schur complements. \Vriting the local stiffness matrices A(i ) in the block form l .j (i) .c(i) j i (i) = .""1.11 .""112 A T .j (i) 1 (i) .""112 .""122 where A}V corresponds to the subdomain interface and to the interior the local Schur complements have the following form: c(i) : di) :1 ( i) :1 (i) -1 .. d i) cJ .""ill .""112 .""122 .""121 (2.8) The Schur complement 8 corresponding to the global stiffness matrix A can then be also obtained by the standard process of subassembly \Ve can now solve the problem J c """"' ;\r .. c(i) ;\r"l. -J.\i i = 1 Sx = b, (2.9) (2.10) --1. where b = b A12A22 b is the reduced right-hand side obtained by the elimination of the interior degrees of freedom. rviethods based on solving Schur complement 31

PAGE 46

problems like this one instead of solving problems with the original stiffness matrices are labeled as substructuring methods by some authors. \Ve will, however, use the term substructuring methods for a broader class of nonoverlapping methods, including the two-grid methods based on transfer operators with multilevel smoothing describ e d in Chapter 5. One advantage of the reduced methods is alread y obtained by solving the problem with Schur complement, which is significantly better conditioned than the original matrix A ( cf. Lemma A.l. 7). It also has many fewer unknown s The earl y engineering applications of substructuring of 1960's wer e solving (2.10) by a direct solver (cf. [74]). Today, the reduced system (2.10) i s usuall y solved b y a preconditioned conjugate gradient method; the preconditioner is typically based on solution of the local subdomain problems. Throughout the text we will find us e ful the concept of discrete harmonic functions. These are defined by In matrix form, this is equivalent to (iL+ idi L () Iiz1 X IizzX (2.11) i = 1, ... J. ( 2.12) The discret e harmonic functions are uniquely determined by their interface values i:. Discrete harmonic functions are closel y related to the Schur complement in the following sense: if xis the discrete harmonic extension of i:, then from (2.12), i = A2] A21 i: and simple manipulations yield xTAx = i:T8i:. 32

PAGE 47

In addition, the following l emma demonstrates that the discrete harmonic extensions (hence also the Schur energy norm) have the lowest energy among all vectors having the same values on the in t erfaces. Lemma 2.4.1. Let A be a symmetric parti-tioned into its interface and interior components A = and let S T A t 2 A22 denote the Schur complement of A. Then Proof. Given y s uch that f) = i:, let us solve the following problem: T T T Find w such that ti: = 0 and iJ A22ti: = iJ A21 f) iJ A22Y Vv As A22 is nonsingular, such w exists and satisfies the equival ent minimization of the functional 1 T T, F (v) = 2 v Av + y A v over all vectors with D = 0. As this constraint is trivially s atisfi e d b y the zero ve ctor, w e have F (v) 0. l'vioreover, the function z = y + w is the harmonic extension of i:, as z = i: and T T T z Au = y A v + w A v = 0 Vv E { u : u = 0}. Thus we obtain (A z z) = (A(y +tv), y +tv)= (Ay, y) + 2F(' w ) (Ay, y). From here the statement follows by the definition of Schur complement. D 33

PAGE 48

Figures 2.4, 2.4, 2.4, 2.4 and 2.4 depict the harmonic extension of simple functions prescribed on the interface; the functions exhibit the "smoothing') effect of discrete harmonic extension following from Lemma 2.4.1. Note that in certain situations the trivial extension may coincide with the discrete harmonic extension (cf. Figure 2.4). It is well known [93 ] that in order to achieve good convergence prop erties, the preconditioner has to be augmented with a mechanism of global ex change of information. Let us note that another advantage of nonoverlapping methods is that on unstructured meshes it is much easier to algebraicly construct a coarse space for them than for the overlapping methods (cf. [63]). Treatment of subdomains with nonmatching grids using currently popular so called mortar elements, is also easier and was recently studied in [4], [ 2 ] [53 ] [ 57], [19] and many others. rviortar elements form a family of nonconforming finite element methods. They are known to be as accurate as their conforming counterparts, while offering more flexibility in refinement. The fine meshes on different subdomains do not have to match over the interface, even the subdomains themselves do not have to form a conforming coarse mesh. For the study of these methods we direct the reader to the above references. 2.5 The Neumann-Neumann Method The original Neumann-Neumann preconditioning operator lvt ?v;v by De Roeck and Le Tallec [ 26 ] is based on the assembly (2.9) of the global Schur complement from the local subdomain Schur complements. It seems natural to precondition the sum (2.9) by the sum of properly weighted inverses of S (i)-1 34

PAGE 49

0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 1 . .... .... .. : ." ... ... :. ... . ... .. ...... . . . . . . .: .... 0 0 Figure 2.1: Harmonic extension of a comer peak. ..... . ........ . . . . ... ...... . . . 0 0 Figure 2. 2: Harmonic extension of a peak on the edge. 35

PAGE 50

0.8 0.6 0.4 0.2 0 1 0 0 Figure 2.3: Harmonic extension of a function constant on one edge 0.8 0.6 0.4 0.2 0 1 . .... -__ .-__ ... . :---. . -. 0 0 Figure 2.4: Harmonic extension of a function linear on one edge. 36

PAGE 52

If all subdomains are similar, this yields an efficient preconditioner, as ( J J ) J cond 8(i)-1 ) = 8 (J)-18(i));:::; 0 ( 1). The application of inverses of the local Schur complement matrices corresponds on the continuous level to solving local Neumann problems on the subdomains. This is why preconditioners of this type are called Neumann-Neumann. The Neumann-Neumann preconditioner can be written as follows. Algorithm 3 (Neumann-Neumann preconditioner, [26]). Given r E V compute z = .:vt;v;vr as follows. 1. Distribute r to subdomains 2. Solve the local problems c(i).u . "1, -11(2.13) on the su bdomains. 3. Average the results by J z = 2::: NiDi ui. i = l The weight matrices Di are an important component of this algorithm allowing to preserve good conditioning even when the subdomains differ in certain aspects, such as the values of coefficients of the continuous operator. Some appropria t e choices for Di were discussed in [26] and [ 66]. \Ve will mention the choices suitable for problems with large variation of coefficients across the interfaces of the su bdomains in Section 2. 7 and also in Chapter 6. Since for the model problem with operator (1.3) the local Schur com plement matrices S (ii) are typically singular, De Roeck and Le Tallec [26] used 38

PAGE 53

an "approximate" LU decomposition obtained by replacing zero pivots in the Gaussian decomposition by positive values. This algorithm fails to provide any means of global distribution of information beyond that of block Jacobi type and, as expected, suffers from steep dete rioration of convergence as the number of subdomains increases. 2.6 EBE method In engineering practice the solver is often very closely tied to the finite element software. One of such methods of the nonoverlapping domain decomposition type is the element-by-element method originally proposed by Hughes Levit and Winget [50] and further studied by Hughes and Ferencz [48] and Tez duyar and Liou [84], [85]. This method, which may currently be a becoming industrial standard in the field of iterative solvers applied to nonlinear problems assumes that the element stiffness matrices are available. Every element is treated as an independent subdomain. The method is used as a preconditioner in the conjugate gradient method. The main idea of the method is simple: The element matrices are factored and the preconditioner applied element by element based on these factorizations. The preconditioner may be written as .:Vi = diag(A)1 / 2 (rr L()i(i))) (rr D()i(i))) (. ll U()i(i))) diag(A)112 1 = 1 1 = 1 where net denotes the number of elements in the system and L, D U denote the factors of the Crout factorization of a matrix, )i(i) = L(A(i))D(A(i))U(A(i)). Here all the element matrices are understood as embedded in an n x n identity matrix. The existence of the factorization is guaranteed because of the use of 39

PAGE 54

positive definite local stiffness matrices .4. (i) obtained from A (i) by scaling and regularization .4_i =I+ diag(A)-112(Ai diag(A ( i)))diag(A)-112 As pointed out by \Vathen [91], the method is essentially a multiplica tive version of the original Neumann-Neumann method described in Algorithm 3. The difference is that the subdomain stiffness matrices in EBE are first scaled and regularized. The chief advantage of the EBE method is its very low setup overhead compared to the typical two-level domain decomposition or multigrid methods. T he method is very useful in treatment of nonlinear or time dependent problems, where refactorizations need to be p e rformed. The cost of factorization of local element stiffness matrices is amortized over the iterations per timestep and over the number of timesteps but because of the low factorization overhead due to the small size of element matrices, it is possible to perform refactorizations more often, which usually leads to faster overall convergence. The potential of this method is most visible in case of three-dimansional problems with low fractal dimension of the mesh, where direct factorizations are extremely inefficient. The whole procedure is quite suitable for implementation on parallel architectures. This can be achieved by grouping the elements into noncontiguous subgroups. Although the subgroups have to be processed sequentially, the local stiffness matrices can then be factored in parallel for each element within a given subgroup. Thus, both factorization and the action of the preconditioner can be computed in parallel using this grouping of elements. The obvious defficienc y of the method is the lack of coarse space ex change of information. Despite this, computational experiments suggest that 40

PAGE 55

EBE is suitable in application to nonlinear problems, where finding simple preconditioners is critical, and where EBE proved superior to diagonal precondi-tioning in terms of storage, convergence propert ies and sensitivity to penalty-type boundary conditions ( cf., e.g., [48}.) Another shortcoming, preventing black-box implementation, is EBE's reliance on the availability of the finite element data. In Chapter 3 we present a fully algebraic method with a coarse space, somewhat similar in spirit to EBE, with much improved convergence rate (independent of the number of subdomains) proved under regularity-free assumptions for problems on unstructured meshes. 2. 7 BDD as an Algebraic Method The BDD preconditioner [ 62 } is based on the original Neumann-Neumann preconditioner by De Roeck and Le Tallec [26} described in Section 2.5. The algorithm proposed in [ 26 } suffers from the lack of global distribution of information resulting in dramatic deterioration of performance with increasing number of subdomains. As noted in Le Tallec [55}, the method becomes impractical when applied to problems with the number of subdomains larger than about 16. In order to defeat this drawback, lVIandel [62} added a coarse problem as follows. Let ni = dim vi, 0 ::; mi ::; n i, and Zi be n i x m i matrices of full column rank such that i = 1, ... J (2.14) and let VV c V be defined b y J Tl' = {v E V v = L FliDiui, 'Ui E Range Zi} i = l 41

PAGE 56

The space will play the role of a coarse space very much like in variational multigrid methods. \Ve say that 8 E V is balanced if i = 1 ... J. (2.15) T he process of replacing r by a balanced 8 = r -8w, w E VV, is called balancing. \Ve are now r eady to d e fine the action r >---7 z = .:vt-1r of the BDD preconditioner. Algorithm 4 (BDD preconditioner, [62]). Given r E V compute .:Vi-1r a s follows. 1. Pre-balance the original residual by solving the auxiliary probl e m for unknown vectors )..i E IRm', i = 1, ... ,J. (2.16) 2. Set J 8 = r 8 "' NJDJZJAJ, ( 2.17) j = l 3. Find any solution ui for each of the local probl e ms c(i).u "' J..J 1 i = 1 ... ,J. ( 2.1 8) 4. Post-balance the residual by solving the auxiliar y problem for P i E IRm', J Z[D[f\r{(r8 "' JVJDJ('uJ + ZJJ.lj)) = 0, i = 1, ... J. (2.19) j = l 5. Average the result on the interfac e s according to J z = L f\ri Di(ui + Zilt i). ( 2.20) i = l 42

PAGE 57

If some mi = 0, then Z i as well as the block unknowns fti and Ai are void and the i-th block equation is taken out of (2.16) and (2.19). An important design choice for both the original Neumann-Neumann preconditioner and for BDD is the selection of weight matrices D i that form a decomposition of unity on the interface space V, J """" T r T L i\iDii\i = I. i = l (2.21) The most straightforward choice for Di is a diagonal matrix with the diagonal elements being the reciprocal of the number of subdomains the degree of freedom is associated with. A better choice, which also guarantees a convergence bound independent of coefficient jumps between subdomains is given in Theorem 2. 7.3 below. For other possibilities, see [26] and notes in Chapter 6. The presence of the coarse problem guarantees that the possibly singu-lar local problems (2.18) are consistent. Indeed, after the pre-balancing Step 1. of Algorithm 4, the vector s NiDiZi Vi = 1, ... J. Therefore s i = D[N[s The presence of the coarse problem also guarantees that the result of the algorithm does not depend on the choice of the solutions of (2.18), see [ 62]. In practice, the residual of the initial approximation should be balanced first as in (2.19); then the first balancing step (2.16) in every iteration can be omitted since the residual r received from the conjugate gradients algorithm is already balanced. The addition of coarse spac e results in a very robust method with op-timal substructuring convergence properties, with the condition number of the preconditioned operator bounded by C ( 1 + )2 as we will demonstrate. 43

PAGE 58

lVIandel [63] proved the following abstract estimate as the main tool for convergence analysis of BDD. Theorem 2. 7.1. Algorithm 4 returns z = jvt-1r, where jvt IS symmetric positive definite and cond(jvt, S) s; C, where c = (2.22) Note that a different proof of Theorem 2.7.1 can be found in Chapter 4.4.2. For the reader's convenience we sketch the theory leading to the optimal substructuring condition number estimate. Our exposition follows lVIandel, Brezina [66]. For more detailed analysis as well as computational experiments using BDD, see lVIandel, Brezina [ 65 ] In applying the bound of Theorem 2. 7.1, we will find useful the concepts of glob and glob projection, defined as follows. Definition 2. 7.2 ([ 66]). Any vertex, edge, and, in the 3D case, face, of r will be called a glob. A glob is understood to be relatively open; for example, an edge does not contain its endpoints. \Ve will also identify a glob with the set of the degrees of freedom associated with it. The set of all globs will be denoted by 9. For a glob G E 9, define the glob projection as follows: for a vector u E V, Ecru E V is the vector that has the same values as u for all degrees of freedom in G and all other degrees of freedom of Ecru are zero. The glob projection in terms of the local degrees of freedom is Eij = fv}'Ec:Ni : Vi --+ Vj. 44

PAGE 59

Note that any two distinct globs from g are disjoint and it holds that J r = U DDi \ DD = U G. i=1 0 E 9 The mappings E0 correspond to zero-one matrices and satisfy LEo= I, (2.23) GE9 and (2.24) \Ve are now ready to give an abstract bound in the case when the matrices S( i ) are scaled by arbitrary positive numbers o;i This corresponds to coefficient discontinuities of arbitrary size between the subdomains. The theorem is formulated and proved exclusively in algebraic terms. Theorem 2.7.3. Let O:'i > 0 i = 1 ... ,J, t 1/2, and EiJ, N i S(i), and Z i satisfy (2.8) (2.23), and (2.14). Define D i as the diagonal matrices Di = L d(i, (2.25) G:Ef:j # O and assume that there exists a number R so that for all i, .i = 1, ... J and all 1 I ji 1 2 1 I 1 2 I Eo uil <;( n s; -Rl uil <;(i l O:j O:i (2.26) for all u i such that u i Ker (S( i )), S(i)ui -Range Z i Then the weight matrices Di form a decomposition of unity (2.21), and the preconditioner defined by Algorithm 4 satisfies cond(.A1, 8) s; K2L2R, (2.27) where K = max i l{i lVJ'lVi f 0}1 and L = maxi,) I {G ElJ f 0}1 45

PAGE 60

Proof. The decomposition of unity property (2.21) follows from the definition (2.25) and from (2.23), J J :LN[Diflli = L L d(i,G)Eo =LEo= I. i = l i=I G E 9 Let j be fixed. Due to the bounded intersection of the subdomains, there are at inequalities imply that and (2.28) If Ei] f 0, the coefficient d(i, G) from (2.25) satisfies d(i, G)::::: + rxj), and it follows from (2.23) and from (2.26) that Therefore by (2.28), which concludes the proof, in view of Theorem 2. 7.1. D Note that the constant K is the maximal number of adjacent subdomains n j to any subdomain ni plus one, and L is the maximal number of globs in any ani n anj. If t > 1/2, the estimate (2.27) can be slightly improved; in 46

PAGE 61

particular, if t = 1 analogously to the method of De Roeck and Le Tallec [26], one has cond(J\.-1, S) S:: K2 L2 R/2. To apply Theorem 2.7.1, we first need to replace the S(i ) norm by the scaled H1i2 norm. This is a standard result [ 10, 31, 92], which we state here for reference in an O'i-scaled form suitable for our purposes. Lemma 2. 7 .4. There exist constants c > 0, C independent of H or h so that 2 1 2 2 c !u!ti2.0n S:: !lu!l s ( n S:: C !u!ti2.0n O'i . To derive the fundamental inequality ( 2. 26) assumed in Theorem 2. 7 .3, we identify (by abuse of notation) V with i:'h.(r) and i:i with Vn(ani). Then the glob projections are E0 : V11(f) --7 V11(r), and (2.26) becomes a bound on the increase of the H 1 i2 norm when a function in V11 (ani) is changed by setting its values to zero on all nodes of ani \G. \Ve first consider the two-dimensional case, n c IR2 Since ani is onedimensional, we may use the properties of the space V11(0, H) of piecewise linear functions on a uniform mesh with step h on the interval [ 0, H]. The following form of Discrete Sobolev Inequality was proved by Dryja [28]. Lemma 2. 7 .5. There exists a constant C such that l lull iP>(O,H) S:: C ( 1 llull7fl/2(0,H) V-uE Vn(O, H). \Ve will also need the following bound for the H1 i2 norm of the extension by zero from an interval to the whole IR, proved by Bramble, Pasciak, and Schatz [10, Lemma 3.5]. 47

PAGE 62

Lemma 2. 7 .6. There exists a constant C such that for all u E v'h. ( 0, H) satisfying u(O) = u(H) = 0, u = 0 outside (0, H), 2 ( H) 2 2 lult/2,1R::; C 1 +log h l lullu'"'( O H ) + lult J2,(0,H) An estimate of the H112 norm of a function, obtained by sampling the value of a given function at one point, follows easily. Lemma 2.7.7. ThereexistsaconstantCsuch thatforallu E Vn( O,H) 0::; h::; H and v0 E V'h(IR) defined by u 0(0) = u(O), v0(x) = 0 for J x l :::::> h, 2 ( H) 2 lvoh;2 ,1R::; C 1 +log h l lullt/2,(0,H) Proof. Let L = Jlui!Loo(o,u) It follows from Lemma 2.7.6 that 2 ( . 2h) 1 1 2 2 lvoltJ2,1R::; C 1 +log h I voiiP(-h,h) + lvolt/2 ,(-h,h) (2.29) Using linearity of u0 we obtain 2 Jh Jh lvo(s) v o(t) 1 2 2 lvoh;2 (-h ,h) = I l 2 dsdt::; 4 L -h -h 8 t (2.30) II 1 1 2 I ( ) 12 2 . J 12 c 2 2 because uo L ""'( -h,h) = vo 0 ::; L Thus, uo 1 ;2,( -h,h ) ::; ""L But L < C(1 )Ji ullf; 2 ,(o,u) by Lemma 2.7.5, which concludes the proof. D By subtracting such spikes at the endpoints, we can extend Lemma 2. 7.6 to the case when the values of u at the endpoints are nonzero. Lemma 2. 7.8. There exists a constant C so that for u E v'h.(O, H) and wE v'h(lR) such that w = u on [h, H -h], and tu(x) = 0 for X::; 0 X:::::> H 2 ( H)2 2 lwh ; 2 ,1R ::; C 1 +log h llull t / 2 ,(0,H) 48

PAGE 63

Proof. Defin e u(x) to be zero for x E ( oo, -h) U (H + h, oo) and linear in [ -h, OJ and [ H, H + h]. Further, define u0 and Vu by { u (O), vo(x) = 0 v0 linear in [-h, O ] and in [ O,h ] { 'U(H), vu(x) = 0 X= 0, lxl2' h, x=H, lx-Hl2' h V u lin ear in [ H -h H] and in [H, H + h]. Writing w as w = u v 0 vu, and applying Lemma 2.7.6 and Lemma 2.7.7, w e obtain 2 ( . H) I 1 2 2 lwlt/2,1R::::: C 1 +log h I tul L00(0,H) + lwlt/2,(0,H) ( H) 2 2 = c 1 +log h lluiiLoo(O,H) + lwllj2,(0,H) ( H) 2 2 2 2 ::::: C 1 + log h II'UIILoo(o,u) + 3(1'Uit/2,( 0 ,H) + l volt/2,1R + lvuh/2,1R) ( ( . H) 2 2 . H 2 ) ::::: c 1 + log h l l uiiLoo(O,H) + lul tj2 (0,H) + ( 1 +log h) llulltj2,(0,H) Application of Lemma 2.7.5 to the term II'UIILoo, (o,H) concludes the proof. D \Ve now have the tools to estimate the H112 norm of the glob projections E0 This s hows that an arbitrary function in v'h.(DDi) can be de c omposed into i t s glob parts with onl y a small penalty of H112 norm increa se. Theorem 2.7.9. Let D c 1Rd, dE {2, 3}. Then there exists a con stant C not dependent of h or H so that for an y glob G E 9 and for all u E Vn.(DDi), 2 ( H)2 2 I IE o ullt/2Jm, ::::: C 1 + log h llull t/2,0 49

PAGE 64

Proof. In the 2D case, the proposition follows by using a mapping of ani onto an interval so that G maps to ( 0, H), from Lemma 2. 7.8 for G being an edge and from Lemma 2. 7. 7 for G being a vertex. In the 3D case, the proposition was proved for the case of G being a face of ani as Lemma 4.3 in [ 12 ] In the case of G being an edge or a vertex of ani, the proof follows from Lemma 4.2. and the proof of Lemma 4.1. of [12]. 0 The bound on the condition number of the BDD algorithm follows. Theorem 2. 7.10. Let n C 1Rd, d E {2, 3}, and the weight matrices Di be diagonal with the entries given by (2.25). Then there exists a constant C independent of H, hand rli so that the condition number of the BDD method satisfies H 2 cond(.A1, 8) :::; C (1 +log-) h Proof. \Ve only need to verify the assumption (2.26) of Theorem 2.7.3. Lemma 2.7.4 allows to replace the S ( i ) norms by the H112(an i ) seminorms, which may in turn be replaced by the H112(ani) norms owing to the inequality (A. 7) Now it suffices to use Theorem 2. 7.9. 0 2.8 DD as a 2-level Multigrid Another method similar to BDD was proposed by IVIandel [64]. As in BDD, this method also treats the local problems in additive way while a coarse problem of modest siz e is added in multiplicative fashion. The main difference between the two methods is that BDD requires subdomainlocal stiffness matrices as input, whereas the method of rviandel described in [ 64] uses submatrices of A instead. 50

PAGE 65

The method may be written as follows: Algorithm 5 (Hybrid Method). For a given r E V, compute z as follows: 1. Compute the coarse level correction 2. Compute the local subdomain solutions 3. Gather the solutions together J U = 2::: 'Ui i = l 4. Compute and retum the corrected solution z o E : a('U -zo, vo) = (r vo) Vvo E Z = 'UZo. (2.31) (2.33) (2.34) (2.35) Algorithm 5 is nothing but a two-level variational multigrid for the problem 8 z = b, with a special smoother defined by (2.32), (2.33). Steps (2.31) and (2.34) play the role of a coarse-grid correction. The fact that a method can be viewed as either a domain decomposition or multigrid is not a coincidence; the development of the recent years shows a tendency towards unification of convergence theory for multigrid and domain decomposition methods (e.g., Bramble, Pasciak, \Vang, Xu [14] \Vang [ 90]). In Xu [ 94], a selection of spaces is described with which the multiplicative domain decomposition and multigrid methods coincide. 51

PAGE 66

2.9 Other Domain Decomposition Methods The wealth of multilevel literature is astonishing. Of the methods strongly related to BDD, we mention two here: The method of Dryja and 'Wid lund [32] uses the same coarse space as BDD and weight exponent t = 1/2 in (2.25), but instead of using the local Schur complement matrices S ( i ) in (2.18), it uses S (i) + c7Af(i) with jyJ(i ) positive definite. This avoids solving singular prob lems. Sarkis [79] obtained an estimate for a method similar to BDD for non conforming elements with any t 1/2. As the main purpose of this brief section is to give a few references to other material deserving attention but not covered in this thesis, we note that do main decomposition methods have been developed and successfully implemented for mixed formulations. There is also the dual approach to domain decomposition advocated by Farhat and Roux [39]. This approach is also suitable for solving problems arising from mixed formulations. For the application of domain decom position techniques to mixed finite element methods see [ 43], [34], [25] and the references therein. The dual approach has been introduced in [39] and further studied in [69], [36] and [55] For domain decomposition methods on nonconforming finite element discretizations see [ 4], [ 2 ], [53], [57], [19] and their references. 52

PAGE 67

3. Fully Black-box Overlapping Schwarz Method In this chapter, we propose a practical solver based on the framework of overlapping Schwarz methods. The practical disadvantage of the existing methods is that their setup usually requires knowledge of the mesh structure for generating the overlaps. Another issue troubling practical usability of overlap ping Schwarz methods is the construction of a coarse space. The possibilities are either to start from a coarse finite element mesh forming nonoverlapping subdomains and obtain the fine discretization mesh by refining or to create an independent coarse mesh. The first approach requires the iterative solver to have control over the discretization while the second brings about the technical difficulties of having to deal with nonnested spaces. \Ve will design and analyze a black-box overlapping Schwarz method with a new efficient coarse space avoiding the aforementioned pitfalls of the traditional approach. The creation of overlapping subdomains will be simplified here. \Ve will start from a system of nonoverlapping subdomains but we will not use the common construction of overlapping subdomains by adding new layers of elements to nonoverlapping ones. Instead, the overlaps will be obtained by a special smoothing procedure based on the stiffness matrix A. The desired amount of ovelapping will then be achieved by algebraic means and controlled by the amount of smoothing done. \Ve will also show that the assumed system of 53

PAGE 68

nonoverlapping subdomains can efficiently be generated as a part of the method itself. The material in this section is based on the ongoing joined efforts with Petr [17}. 3.1 Overlapping Schwarz Method with a Coarse Space. The purpose of this section is to specify requirements on the coarse space and overlapping subdomains that will allow us to prove uniform conver gence. Requirements on the coarse space will be formulated in terms of its basis functions keeping in mind that our final goal is a black-box method efficient on unstructured meshes. H)r this reason we avoid the assumption on the L00 boundedness of the gradient of basis functions. Before we introduce notation specific to this section, let us briefly recall some assumptions on the problem to be solved. Let n c IRd, d E {2, 3} be a Lipschitz domain and 7h be a quasiuniform finite element mesh on 0 of a characteristic meshsize h. Let v'h, be a P1 or Q1 finite element space associated with the mesh 7h with zero Dirichlet boundary conditions imposed at some finite element nodes Vi E r [) c an. Those nodes will be referred to as constrained nodes. For simplicity, we assume that the finite element basis functions 'Pi are scaled so that !I'Pilkoo = 1. \Ve consider the finite element discretization Ax= b (3.1) of the following elliptic model problem: Find u E v'h. such that a(u, v) = (!, v)u(n) Vv E Vn, (3.2) 54

PAGE 69

where d J Du(x) Dv(x) a(u,v) = 2:: a(x)-.-.. --.-.. -dx, i = I o Dxi Dx i 0 < C.t::; a(x)::; C2 for all XED, and v'h. denotes the finite element space with resolution h. Clearly, Let us consider the s ystem of overlapping subdomains {D J }j =1 covering the computational domain D. T he coarse space c Vn. will be defined by its basis, i.e. and local fine level spaces {Vj}f = 1 will be determined b y subdomains n j via (3.3) where f N = [)Q \ f D is the part of the boundary With the natural boundary condition imposed. For the sake of parallelism, we assume that we have a system of index sets c j such that (1) c i = {1, ... J}, ci n cj = 0, i :1 .f. (2) Vk A Vl for every k, l E Cj Denoting by the a(', )-orthogonal projections onto \:i, respectively, the error propagation operator of the method we will anal yze can be written as (3.4) Note that due to the A-orthogonalit y of the spaces vj, Vk for .i, k E C i, we have I 2:: PJ = I1 (IPJ), (3.5) jEC; jEC; 55

PAGE 70

so the method may be viewed as either a hybrid or purely multiplicative. This fact allows for parallelism in the computation of subdomain error corrections. The following two assumptions sum up our requirements on the sys-tem of subdomains and on the coarse space basis functions that will be sufficent for proving uniform convergence of the algorithm with error propagation opera-tor (3.4). Assumption 3.1.1 (Subdomain Geometry). Let D be union ofsim-ply connected clusters DJ of finite elements such that a) diam (Dj)::; CH, .i = 1 ... J. b) :::Jc > 0 Vx ED :::JDJ: x E DJ & dist(x, DDJ \ DD) cH, .i = 1, ... J. c) ::3K1 K2 > 0 : Vx E D the ball B(x, CuH) = {y E D : dist(y,x) ::; CuH} intersects at most K1 + K2C% subdomains DJ. d) meas(Dj) CHd .i = 1 ... J. Assumption 3.1.1 c) will be used in the following context: An object of a diameter O(H) intersects at most 0(1) subdomains D i Assumption 3.1.2 (Coarse Space Basis Functions). \Ve assume that the basis functions i(x) = 1 for every x E Dint and dist(x, fD) ::; CH for every xED\ Dint. Note that the assumption can trivially be satisfied by P1 or Q1 finite element basis functions. \Ve will, however, design a coarse space basis more appropriate for unstructured meshes. As we will demonstrate, the basis of smoothed 56

PAGE 71

aggregation functions described in Section 3.2 is a good candidate. In the rest of this section we will prove the following theorem: Theorem 3.1.3. Under Assumptions 3.1.2 and 3.1.1, for the error propagation operator (3.4) of Algorithm 8 defined in Section 3.2 below it holds that I ITII a 1-c, I I II a= a(, )112 with a constant C independent of H, h. The convergence proof will rely on the use of Lions' lemma [5], for application of which we need the following simple existence result: Lemma 3.1.4. Let Assumption 3.1.1 be satisfied. Then, there exists a set of functions { d J j }j=1 Jj E l:V1 '00 ( Q) such that 1. 14Ji.l vP=(n ) C H-1 2 . '<;"'J .,l -1 ()Il 11 L...j = 1 'h-H, Proof. For each Qj we define Due to parts a) and c) of the Assumption 3.1.1, J 2:: c j = l and from the part b) we also have J """"'. ,,!, > (' L'h j=l 57 for x E 0\ Qj (3.6) (3.7)

PAGE 72

\Ve will show that (3.8) Consider two points 'U, v E Dj. \Vithout the loss of generality we assume ::; for some point P E DD.j \DO. Further, using triangle inequality, Therefore, < H-1 (dist(v u) + dist(u, P)) < H-1dist(v;u) + i } ILip(n ) := sup J r ( J) : x y E D; X f. y ::; H-1 c1stx,y Now, ( 3.8) follows from the well-known equivalence I kip(n ) I lwl=(n) Let us define Due to (3.6) and (3.7), (3.9) Further, from (3.7), (3.8) and bounded intersections of subdomains Dj, denoting the Euclidean norm in lRd by I I IJ, 58

PAGE 73

< 2::: j:xEf2j y E U j = l < cH-1 (3.10) for every x E D \ B, where B is a set of zero m e asure. Finally we set Statements 2. and 3. of the lemma are trivially satisfied by functions 1./Jj. F11rther, (3.8), (3.9) and (3.10) imply I Jw(x)(VCj )(x) + Cj(x)(\l'U ; )(x) II < I'U:(x)l + almost everywhere, which proves statement 1. of the lemma. D Let us define the linear interpolation operator Q : H1 ( n) ---7 by J 1 Q'U = 2::: o:i1\, where O:i = o:i(u) = . u(x) dx. (3.11) meas(D ) n 1 = 1 1 As by Cauchy-Schwarz inequality and by Assumption 3.1.1 d) we obtain the following bound for the coefficients of interpolation Q: (3.12) 59

PAGE 74

The proof of the following lemma is essentially standard, except for certain technical difficulties stemming from allowing rather general geometry of subdomains nj. Lemma 3.1.5. Under Assumptions 3.1.2, 3.1.1, for the interpolation operator Q defined by (3.11) and every u E Hci,ro (D) it holds that (3.13) and (3.14) where C is a constant independent of H and h. Remark 3.1.6. The inequality (3.13) is one of the forms of the so called weak approximation property (cf., [15]). It can be contrasted with the traditional approximation property used in the multigrid and finite element literature. Proof. Let us set B = D \Dint, where Dint is the domain introduced in Assumption 3.1.2, b). Further we define B = {i: D i n B f 0}, B1 = U Di, TV= sup{dist(x,fv)} iE13 xEB' and set B0 ={xED: dist(x,fv) From Assumption 3.1.1 it immediately follows that VV therefore the inequality yields (3.15) 60

PAGE 75

Then, the restriction of Qu onto B can be expressed as (Qu)(x) = L ni(u)
PAGE 76

In the rest of the proof we will verify (3.17) and (3.18) on the domain Dint (see Assumption 3.1.2). FN the convenience of the proving, let us consider the extension UE of the function 'U E HJ,rn ( n) satisfying (3.19) Let us recall that Ni = {.j : Dj n Di :::/=-0}. For i = 1, ... J we define B; = UjEJ'/j Dj and Bi to be a ball circumscribed about Ftom Assumption 3.1.1 it immediately follows that diam (Bd s; CH and therefore, we have the Friedrichs inequality in the form JJuJJ uu3; ) s; CHJuJul(H;) V'U E {v E H1 ( Bi) : { v dx = 0}. lu, (3.20) Further, due to Assumption 3.1.1 a), c), the intersections of balls Bi are bounded. For every .i = 1, ... J we define (3.21) Then, the Friedrichs inequality (3.20) holds for every flj Due to Assump tion 3.1.2 b), for X E Din [tint it holds that (Qu)(x) (3.22) Therefore, 62

PAGE 77

Further, J L II(I-CJ)(fii + (;)I I J J z(n;nnint) i = l J L I I(J-CJ)uilli_z(n,nnint ) i = l J < 2 L (llui!IJJ z(ll;) + IICJuiii J J z (n,nnint)). ( 3.23) i = l I ICJuiii JJz(n ;nnint) < I I L nJ(u i)
PAGE 78

J ICJ('ui + (;) (!l;nnint) i = l J ICJui + cilifl(n,nnint) i = l J "" 2 L ICJuilul(n,nnint) i=l J "" "" 2 < L I L O:'j(ui);ln ( n ) ) 2 < (card (N, ) ;E, aJ (u ) Iii>; lir'(fl)) [ used (3.21)] [ used (3.22) ] [Assumption 3.1.2a) and (3.12) ] J < CH-2 l ui!liJ2(H;) i = l i = l J "" 2 C L !uE!ul(H;) i = l 2 < CI'Uiu1( n ) [used Friedrichs inequality ] [ used bounded intersections and (3.19)] which together with (3.18) completes the proof of (3.14). D The previous lemma allows us to prove existence of a H1-stable decomposition of function u E v'h. into subdomain components. Lemma 3.1. 7. Under Assumptions 3.1.2, 3.1.1, for every finite element function u E v'h., there is a (not necessarily unique) decomposition {ui} L0 u i E V i such that J U = and i = O 64

PAGE 79

where constant C is independent of h, H. Proof. Let us define the operator In: C0(D) ---r v'h. by n ln(a) = L = ), i = l where { vi}j =1 is the set of finite element nodal points, is the finite element basis, and Ilx = is the finite element interpolator. Let us consider the basis {4;1}L1 from Lemma 3.1.4. As diam (supp(;1)) :::; CHand :::; CH-1 we also have Let us define the decomposition uo Qu, ui i = 1, ... ,J, where w (IQ)a and q is the interpolation operator from Lemma 3.1.5. As w is a finite element function and I:/= 1 VJi = 1, J J Lui= h(L ;iw) = h(w) = w, i = l i=l proving validity of the first statement of this lemma. Further, for i = 1, ... J it holds that lui l u1(!l; ) 6,.. 0

PAGE 80

Cllw\74;i + 4 ; i\7tvli[L2(n;)]'t < C(llw\74;i I I + IIV;i \ltv II < + < C + ltvlul ( !! ;)). Therefore, owing to the bounded intersection property of subdomains ni the approximation property (3.13) and the energetic stability (3.14), which completes the proof of the second statement of this Lemma. D To complete the proof of Theorem 3.1.3, we need the following two abstract results. Lemma 3.1.8 ([5]). Let V be a Hilbert space with an inner product a(,), and Vi denote subspaces of v v = Uf= o vi, with inner product a(-,). Further let operators P; : V ---7 V i be a(-, )-orthogonal projectors. Then if there exists a constant CL > 0 such that J J V v E v ::lui E vi : v = E V i and L a(vi, vd CLa(v, v), i = O i=O then 66

PAGE 81

The following lemma is a straightforward simplification of [89 Theo rem 3.2] suitable for our purposes. Lemma 3.1.9. Let (i) There exists a constant CL such that J a(v v) CLa(L v) Vv E i/. i = O (ii) Let c. = { Eij JiJ=I be a symmetric matrix such that l 2 l 2 . T 1 a(Prv, v ) 1 Vu,v E \:, i,.i = 1, ... ,.1. Then the product algorithm with error propagation operator (I P0 ) (I P1 ) (I PJ) is convergent with the rate bounded by 1 2' C'"(1 + e(c.)) \Ve can now return to proving Theorem 3.1.3. The assumption (i) of Lemma 3.1.9 follows from Lemmas 3.1.7 and 3.1.8. From bounded overlaps property of subdomains n i we obtain e(c.) c, where the constant Cis independent of the numbering of spaces V i i = 1, ... .J. Therefore, Lemma 3.1.9 yields n" II(I II (I1 -c i = 1 jEC. ; where are decomposition sets introduced in at the beginning of this section. Now the proof of Theorem 3.1.3 follows from (3.5). 67

PAGE 82

3.2 Smoothed Aggregation Coarse Space and BOSS In this section we define a coarse-space based on the concept of smoothed aggregation introduced in [87]. Overlapping subdomains will be defined based on the nonzero structure of prolongator. The method described here allows black-box implementation; its only input is a system of linear algebraic equations Ax = f and the system of aggregates of degrees of freedom. Assumptions on aggregates allowing the proof of uniform convergence will be given in the next section. Let {.At}/ =1 be a given system of aggregates of nodes forming a disjoint covering of the set of all unconstrained nodes i.e. J U At= {1, ... n}, At n AJ = 0 fori :f j. i = l \Ve define a vector li E lRn as follows: for node .i E At (3.24) elsewhere. Then we define the tentative prolongator P to be an n by J matrix such that its i-th column is equal to li. Note that this grouping of nodes into disjoint sets and subsequent identification of each set with a single degree of freedom on the coarse space is referred to in the literature as aggregation technique. It was introduced in the early 1950's b y Leontief [58] and frequently used in the problems of economic modeling ( cf., lVIandel and Sekerka [68] and the referenc e s therein.) In order to eliminate oscillatory components from the range of P, we introduce an n by n prolongator smoother S and define the final prolongator P 68

PAGE 83

bv P=SF. (3.25) The following algorithm describes construction of the polynomial prolongator smoother S suitable for our purpose. The key property of the resuiting smoother is that g{S2A) Cdeg(S)-2g(A), which allows to prove that the H1-seminorm of our coarse-space basis functions is sufficiently small (Lemmas 3.3.2, 3.3.3). Note that there is no need to physically construct the prolon-gator smoother S. For the sake of the method's implementation, we need only the final prolongator P =SF, which can be generated in O{n deg{S)) operations in a single processor environment. Algorithm 6. For a de s ired degree ds of the prolongator smoother S and an estimat e of the spectral radius of A such that (3.26) we define the prolongator smoother by 1. Let K = l loK3 ( 2ds + 1) J -1 where l J is the truncation to the nearest smaller integer. 2. For i > 0, set i!i /r. and compute S defined by the recurrence formula A0 A, 69 fiK (J1 ,;-I :1 ) '"llE.'I'E' :1 1". j = O
PAGE 84

Remark 3.2.1. Algorithm 6 is capable of generating smoother S of certain degrees only. The choice of K in the step 1. give s S of degree closest to the desir e d one ds se e ( 3.29). If we set for i E 1N i ( 4 ) si = rr 1 ;=1 3 then for the prolongator smoother S created by Algorithm 6 we have S we get deg ( 8 i -1) + deg(Ai-1) = deg(8i 1 ) + 3 i -1 i 1 3 i -1 "'3] = -L 2 j = O T herefore, S = satisfies ds 3 < deg (S) ::; ds. (3.28) (3.29) The nonzero structure of the prolongator P determines the supports of our coarse space basis functions (3.30) Here IIx = I:f =1 x d > i is the finite element interpolator U>i } the finite element basis and e i the i -th vector of the canonical basis. The prolongator P is obtained as a result of the matrix multiplication P = SF, where P is the tentative prolongator and S is the polynomial in the stiffness matrix A given by Algorithm 6. 70

PAGE 85

The computational subdomains D i are derived from the nonzero struc-ture of the matrix psymb, which is obtained in the same way as the prolongator P, except the matrix operations involved are performed only symbolically. That is, we replace nonzero entries of matrices by ones and use the arithmetic 1 + 0: = 1, 0 + 0: = o:, 1 0: = o:, 0 0: = 0 for o: = 0 1. Then, defining (3.31) we have (3.32) Note that if the sparse matrix operations are implemented so that numerical zeroes are never dropped, the results of these symbolic operations are obtained for free as a side benefit of the computation, so the symbolic operations need not be performed at all. Algorithmically, this can be accomplished as follows: First, for each column of the smoothed prolongator psymb let u s define the li s t of its nonzeroes 'r { psvmb __;_ ()} JV j = 'f, : i) T n j = card (Nj) and then b y nj 0 1 matrix Nj resulting from selecting the columns with indices in JVj from the n by n identity matrix. Further we define local matrices Ai and local correction operators R i A 7\ T T -t l\T _.,_f-li == 1 'i i _.,_J-}_j 'i i i = 1 ... J. (3.33) Analogously, for the coarse l e vel we set T -: 1 T Ao = P AP, Ro = P ( Ao) P (3.34) 71

PAGE 86

I 2 I 2 I 2 3 4 3 4 3 4 I 2 I 2 I 2 3 4 3 4 3 4 I 2 I 2 I 2 Figure 3.1: Possible a ss ignment of elements to different classes C i in 2D. For a positive i RiA is the A orthogonal projection onto the local space V i = {x E mn : X j = 0 for .i f: Ni}. Note that (f'i, I I I IA.) is the vector space isometricall y isomorphi c to the s pace of finite element functions (vi = {:Li = l Xif!i, X E Vi}, a(-, )112). Also, vi intro duced this way satisfies ( 3.3). For the sake of parallelism we need a disjoint covering of the set {1, ... J} satisfying cos(fj, Vic) = 0 for every .i, k E c i, i = 1, ... 'no (3 .35) where the c osine i s measured in A inner product. For structured meshes, such a covering can e asil y b e obtained. H)r simplicit y of demonstration, consid e r 2D and 3D rectangular me s hes. F igure s 3.2 and 3.2 depict possible decomposition into noncontiguou s groups forming Ci in 2D and 3D, respectively. 72

PAGE 87

/ / / / 5 6 5 v 7 8 7 v 6 5 / / / / I/ 1 2 1 / 3 4 3 / 2 1 / / / / / 5 6 5 / 7 8 7 / 5 6 5 / Figure 3.2: Pos sible a s signment of elements to different classes C i in 3D. 73

PAGE 88

\Ve can see that for the rectangular meshes the elements can be subdivided into four noncontiguous groups in 2D and into eight in 3D. \Ve are, however, seeking a similar decomposition for any (unstructured) mesh. For unstructured meshes such a decomposition can be created using a simple greedy algorithm, as the information about the orthogonality of spaces is available. Trivially spaces and V; are orthogonal if akt = 0 for all k E JVi, l E J\0, where akt are entrie s of the stiffness matrix A. Such a test can be easily performed using the formula .. tsymb = psymb'J' :tsymb psymb .'clc .'cl where is the operation of the s y mbolic matrix multiplication. Then, The disjoint covering of the set of coarse space degrees of freedom satis fying (3.35) can be created using the following algorithm: Algorithm 7. Set R = {1, ... J}, i = 0 and 1. repeat 2. set i +--i + 1, 4. repeat 6. set ci +--ci u {.j}' 7. for each k : = 1 set R i +---R i \ {k} 74

PAGE 89

8. until ni is empty, 10. until n is empty 11. set n c = t. Now we have all the components needed to write down the implemen-tation of the Schwarz method with the error propagation operator (3.4). Algorithm 8 (BOSS). Given a vector x i the method returns xi+I computed as follows: 2. Local corrections: for i = 1 ... 'nc do z i = zi-I + 2::: RJd!, where d! = f Azi I jEC; and R j is the correction operator defined in (3.33). 3. Coarse level correction: where R0 is the coarse-level correction operator given by (3.34). 5. (optional) fori= nc, ... 1 do 6. Set xi+I = z0 zi = zi-1 + 2::: Rjd!, where d! = f Azi-1 jEC; Note that if the optional post-smoothing step 5 is used, the algorithm can be used as a symmetric preconditioner in the conjugate gradient method. 75

PAGE 90

Remark 3.2.2. For the sake of brevity, we denote the method de scribed by Algorithm 8 with the choice of operators Ri, R0 given by (3.33), (3.34) as BOSS, short for Black-box Overlapping Schwarz with Smoothed Coarse Space. \Ve will find this abbreviation useful in the tables of Chapter 6. 3.3 Estimates for Smoothed Aggregation In this section, we apply the general estimates to smoothed aggregation method defined in previous section. In order to prove the convergence of the method with the coarse space generated by smoothed aggregations, we only have to verify that Assumptions 3.1.2 and 3.1.1 are satisfied. The pattern of the stiffness matrix A = {aij }f,j= I determines the undi rected graph g = {V,E}, where vertices V = {1, ... n} are indices of all unconstrained nodes and edges E are given by E = {[i,.i] E v X v : O:ij of 0}. For i E V and a positive integer r let us define the graph r neighborhood of i by B(i, r) = {.j E V : dist(i, .i) S: r}. Here, the distance of two vertices i, .i is the minimal length of path connecting i, .i measured in the number of edges forming the path. In the rest of this section, we will prove the optimal convergence re sult under the following assumption on the system of aggregetes The first part of the assumption controls aspect ratios of aggregates. The second 76

PAGE 91

part specifies the number of smoothing steps involved in the construction of the prolongator smoother. Assumption 3.3.1. There are positive integer constants c C C1 C2 and a positive integer n characterizing the graph size of aggregates such that a) In each aggregate Ar there is a node .i satisfying B(.j, co:) C Ar, and dist(k, .i) ::; Co: for every k E Ar. b) For the degree ds of prolongator smoother it holds that The decomposition satisfying the Assumption 3.3.1 can be easily gen-erated using a simple greedy algorithm, see Algorithm 9 in the Section 3.4. The following simple algebraic result is the key tool needed for verifi-cation of the Assumption 3.1.2 a). Lemma 3.3.2. For the prolongator smoother S created by Algorithm 6 it holds that Proof. ([88}). Throughout this proof, we use the notation introduce d in the Algorithm 6. First, by induction, we prove g(Ai ) ::; Oi For i = 0, the inequality holds by (3.26); assume it holds for .i ::; i. Then, by and the inductive assumption 77

PAGE 92

Hence, Now, the second estimate g(S) ::; 1 follows immediately from the fact that S is a product of terms of the form I -1t?j1 Aj. Let us estimate the spectral bound of S2 A. It is routine to verify that Therefore, taking into account that (see Remark 3.2.1) 3K Cds Cdeg(S) and t?o = t?::; C!?g(A) (see (3.26)), we get which was to be proved. D The following lemma demonstrates validity of Assumption 3.1.2, a). Lemma 3.3.3. Under the Assumption 3.3.1, for coarse space basis functions defined by (3.30) it holds that (3.36) where H = exh. Proof. Taking into account the underlying quasiuniform P1 or Q1 finite element mesh the number ex characterizes the "discrete diameter" of aggregates we have Further, due to the Lemma 3.3.2, 2 c e(S A) ::; -2 e(A). ex 78

PAGE 93

Therefore, using the fact that g(A) ::; Chd-2 Similarly, using the fact that e(S) ::; 1 (Lemma 3.3.2), completing the proof. 0 2 I I IISPetll L z(n) < Chd(Sli, Sli) < Chde(S2 ) Remark 3.3.4. The estimate for the smoothed functions
PAGE 94

Proof. The assumption a) follows from the Lemma 3.3.3. The as-sumption c) has been verified in the previous section, see (3.32). Let us prove b). Basis functions derived from the tentative prolongator P, i.e. _, _.._
PAGE 95

we have wi = 0 for every i such that dist(i, 'D) deg{S). As I: .:Pi = IIS'U, the decomposition of unity .:Pi = 1 is violated at most at deg{S) + 1 strips of elements surrounding rD. This, together with deg{S) s; Co: and H = o:h completes the proof. D Lemma 3.3.6. Under Assumption 3.3.1, the computational subdo mains D i defined by { 3.31) satisfy Assumption 3.1.1. Proof. The proof consists of simple, but rather tedious geometrical considerations. Computational subdomains are defined by Di = supp(IIS"ymb p e i), where is the operation of the symbolic matrix multiplication and ssymb is the pol y nomial in A created using symbolic matrix operations too. The degree of ssymb satisfies Further, the support of basis function derived from the tentative prolongator supp{IIPei) is formed by all elements Tj such that at least one vertex of Tj belongs to the aggregate The smoothing by ss.vm b adds deg{s s.vmb) layer s of surrounding finite elements. Taking into account the quasiuniformity of the underlying mesh and the fact that H = o:h;:::; deg{Ssymb)h the measure of added deg(Ssymb) layers of elements itself is greater or equal to C Hd. So, 81

PAGE 96

which proves the Assumption 3.1.1, d). Also, due to Assumption 3.3.1 a) w e s imilarly obtain diam (Di ) s; CH. Hence a) i s also verified. Let us prove b). The s upports of basis fun c tions IIPei cover the domain D. A s Di is created by adding deg(S) co: lay ers of elements to supp(IIPei), we have where distiR
PAGE 97

Theorem 3.3.7. Let the Assumption 3.3.1 holds. Then, for the e rror propagation operator T of the method described in the Section 3.2 applied to the model problem ( 3.2) it holds that I ITII A 1 -c where C is a constant independent of h and the size of aggregates. Proof. The proof follows immediately from Lemmas 3.3.5 3.3.6 and Theorem 3.1.3. 0 3.4 Practical Issues The overlapping method with the coarse space given by smoothed ag gregations has very favorable convergence properties common to most overlap ping Schwarz methods with a coarse space. The n e w method has, how ever, certain advantages over the existing overlapping methods. The advantages of the smoothed aggregation approach include: It can be implemented as a black box with no input required from the user except for the stiffness matrix and the right hand side of the problem. Even though the analy sis assumes an elliptic functional discretized on a quasi-uniform mesh by P1 or Q1 finit e elemen ts, nu merical experiments confirm applicabilit y of the method to unstructured me s hes and a variety of finite element types and problems far beyond the scope of the current theory. The disadvantage common to all overlapping-t ype domain decomposi tion methods is the increase of computational complexity with increasing mea sure of the overlap. Our method cannot avoid this drawback, but the use of "coloring'' in the definition ( 3.4) allows parallel implementation of local solves 83

PAGE 98

and reduces the processing time. In the rest of this section we discuss ways to generalize the method for solving nonscalar problems. \Ve also analyze the computational complexity of the method and a practical algorithm that can be used to generate the aggregates 3.4.1 Generation of Aggregates \Ve now describe a greedy algorithm which will generate the subdomains satisfying t he Assumption 3.3.1. First we extend the definition of graph neighborhood of a node to the graph neighborhood of a set X C {1, ... n} of nodes: B ( X o:) = {i: dist(i, X) ::; o:}. 'With this definition, we can write the Algorithm 9. For the given stiffness matrix A and positive integer o:, create the system of aggregates as follow s : 1. Set R = {1, ... n}, j = 0. 2. for i = 1, ... n do 3. if i E R then 4. if B( {i}, o:) C R then 5. _j<;-_j + 1, 6. 6. \ A j, 7. end if 8. end if 84

PAGE 99

9. end for 10. set J = .i, 11. for i = 1 ... J 12. u n) n R), 13. R+--R \ 14. end for In order to full y complete the aggregate generation description, we give an algorithmic recipe for computing the n-neighborhood of a set X used in Algorithm 9 Algorithm 10. Given a set X c {1, ... n}, { 1 ifi EX 1. Set w E mn as 'Wi = () otherwise. 2. Set w = An w (both the power and the multiplication are performed symbolically.) 3. Set B(X, n) = {i : wi = 1 }. Remark 3.4.1. Algorithm 9 generates a disjoint covering of the s e t of all vertices that satisfies Assumption 3.3.1. In ste ps Ll 0. Algorithm 9 generates graph neighborhoods = B( {.j}, a). After Step 10., the set R con-tains the remaining nodes that could not be made into whole nneighborhoods. For these nodes it holds that V .i E R such that dist(.j, Ste ps 11. through 13. add at most o; "lay ers" of surrounding vertices to some of the aggregates It follows from the construction that at the end of Algorithm 9, R = 0. 85

PAGE 100

3.4.2 Nonscalar Problems The method can easily be modified for solving nonscalar problems. \Ve will briefly describe the changes required. This approach first appeared in [87] in the context of solving problems of order higher than 2. In order to apply the method to nonscalar problems, we need the knowledge of the discrete representation of the local kernel of the bilinear form a(-, ). By local kernel we mean the kernel in absence of essential boundary conditions, i.e. the kernel of the unconstrained problem. Let us assume that we have functions { jJ} )!:;,1 spanning the local kernel of a(-,) (in case of 3D elasticity, 6 rigid body modes). For each function Ji, we need its discrete representation with respect to our finite element basis, or the vector P such that -" r = rrr. Finite element packages usually provide this information. For every aggregate let us define the set 1J1 of all degrees of freedom associated with nodes of Then the tentative prolongator can be constructed by Algorithm 11 (Tentative prolongator-nonscalar problems). For every aggregate and for j = 1, ... 1. H)r ji, compute the vector jiJ E lRn with components { . _,__ If k E 1)i !I_J--k 0 otherwise. 2. Interpret the vector jiJ as the -1) + .i-th column of the tentative prolongator P. 86

PAGE 101

The algorithm in this form can be used to treat quite general bases (e.g. unsealed bases, high order elements or common plate and shell elements). In order to improv e conditioning of the coarse problem it is advisable to perform the discrete l2 orthogonalization of vectors jiJ on each aggregate as suggested in Vanek, lVIandel and Brezina [ 87], or by simply computing their l2-orthogonal projections onto complement of a constant. This is not required by the theory, but practical applications can benefit from such a stabilization. \Vhen solving problems of 2D and 3D elasticity it is possible to obtain the basis {fJ} explicitl y for each subdomain without having to scale it; this construction relies on the fact that the rigid body modes are known and was presented in [83]. The above mentioned approach is, however, more general. 3.4.3 Computational Complexity \Ve will now give an asymptotic bound on the amount of floating point operations needed to carry out the iteration to reduce the error to the truncation level. \Ve will give the estimates for implementation on both serial and parallel architectures. Let Ne8 denote the typical number of elements per subdomain d the dimension of the space on which the continuous problem is cast, and n the number of degrees of freedom in the whole system. Let us first compute the amount of work needed for the setup. On a machine with a sigle CPU, we need 0( deg(S)n) operations to compute the prolongator p = sP. Taking into account that deg( S) ;:::; ;:::; NHd; this a
PAGE 102

to compute the Cholesky factorizations of the local and coarse level matrices, respectively. \Ve also need O(n) operations to compute the coarse level matrix, but this number can be taken out of the consideration, as it is dominated by the other expenditures. 2
PAGE 103

2<1-J Each step of the iteration will require O (Nes'1 ) and 0( z ) operations to compute the back-substitutions in the local and coarse spaces, respectively. Balancing these values, we obtain that the optimal size of a subdomain is about n112 in both 2D and 3D. The resulting computational complexity can then be bounded b y O(n) in 2D and b y O(n716 ) in 3D. The above discussion together with the convergence Theorem 3.1.3 prove s the following theorem: Theorem 3.4.2. Let Assumptions 3.1.2 and 3.1.1 be satisfied, and the Cholesky factorization be used to solve the coarse-level and local subdomain problems. Then, on a serial architecture the optimal number of elements per su bdomain is N;F ;=:j n in 2D and N)}J ;=:j n rt in 3D and the system ( 3.1) can be solved to the level of truncation error in O(nrt) operations in 2D, and O(nil*) operations in 3D. If a parallel architecture with n112 processors were available, the optimal number of elements per subdomain would change to N;}J = N2:J ;=:j and the system ( 3.1) could be solved to the level of truncation error in O ( n) operations in 2D, and O(n* ) operations in 3D. T he above estimates show that the amount of work required to complete the whole iterative process (including its setup) is as y mptotically lower than even just the back-substitution step of direct methods based on matrix fa c torization, which would b e O(n'3i2 ) and O(n51'3) in 2D and 3D respectively. 89

PAGE 104

4. Nonoverlapping Methods with Inexact Solvers This chapter deals with the issue of nonoverlapping domain decomposi tion methods using only in exact subdomain solvers. In Section 4.2, we formulate requirements on the approximate solvers we will use and their properties. In the sections that follow we define a fully algebraic nonoverlapping domain decompo sition method with inexact subdomain solvers and prove the condition number estimate of the resulting algorithm. \Ve will also study the influence of using approximate coarse space problem. 4.1 Inexact Subdomain Solvers The reason many domain decomposition methods use the reduced system is that the condition number of the problem to be solved is reduced from 1/h.Z to 1 / h (see Lemma A.l.7 or Bramble [ 9]). The resulting method is proved to behave better than applying the diagonal preconditioner to the original prob lem ( l\!Jandel [ 61]). However, solving the local problems by direct methods can be costly especially if only a modest number of subdomains is used. Therefore number of attempts to replace direct solvers in solving the subdomain problems by significantly cheaper iterative methods have been made. One of the earliest of these efforts is due to Bi)rgers [8] who proved for the Neumann-Dirichlet domain decomposition and the case of two substructures that convergence independent 90

PAGE 105

of the meshsize can be obtained if the exact local solvers are replaced by a small number of multigrid iterations. Another approach was described for the Neumann-Neumann method in [ 32}, where theory is based on the abstract Schwarz method fram e work recalled in Section 2.1. The method of [33] use s inexact solvers for the Neumann problems under the assumption that the local Schur complements are known. Because the local Schur complements define the global one, this approach is only appropriate in the case the global Schur compl e ment 8 is computed. This, however, has to be avoided in order to keep the computational complexity down. An improved version of the same algorithm, using the formulation with local stiffness matrices A l i) instead, appeared in [ 30]. An elegant way of application of inexact local solvers can also be found there. The disadvantage of this method is its utilization of the finite element basis used for discretization of the problem. One of the most recent attempts to address the issue of inexact subdo main solvers is due to Bramble, Pasciak and Vasillev [13] Their method is based on a trivial approximation of harmonic extensions, which results in reduced computational complexity, but allows only a suboptimal condition number estimate with linear dependence on the ratio If;. An attractive feature of domain decomposition methods used as precon ditioners in the method of preconditioned conjugate gradients is that the matrix of the problem never has to be assembled. Although this is a general property common to all domain decomposition methods, it is perhaps most clearl y demon strated on the example of the EBE methods of Section 2.6, where (except for certain implementations) only local element submatrices need to be known and 91

PAGE 106

stored. The action Ax of the global matrix, required by PCG, can be obtained by the subassembly of the actions of local matrices J ix = 'f\ T :1 r 'f\Tx L T (") _...J-}_. 1 \1. _...J.j_ J.\l ... i = l Thus, only the local matrices A(i ) or their factors have to be stored. An important prerequsite for the application of inexact solvers is reformulating the problem in such a way that the local Schur complements do not figure at all in the formulation of the problem to be solved. That is, special care has to be applied so that using inexact solvers does not affect the problem to be solved. For instance, if the local Schur complement matrices S ( i ) were straight forwardly replaced by their approximations /j(i), the problem to be solved would be ,';J; = b, the solution of which may be completely irrelevant to that of Sx = b. On the other hand, forming and storing exact Schur complements S (i) as well as their approximations does not seem to offer any advantage over storing only the Choleski factors of S(i), which is the common practice. A proper formulation may be found in the early domain decomposition paper of Bramble, Pasciak, Schatz [10]. \Ve give another suitable formulation in Section 4.5.2. Both are formulated in terms of the local stiffness matrices A (i); the main difference between the two is that the formulation in [10] is nonalgebraic. Another significant difference is that we are using a smaller and more efficient coarse space (with one degree of freedom per subdomain for scalar second order problems). \Ve present two substructuring methods avoiding the exact subdomain solvers. First of them is an extended BDD. The second one presented in Chapter 5, is more unorthodox and is closely related to the multigrid method with smoothed aggregation [52, 86, 87]. 92

PAGE 107

4.2 The Inexact Solvers' Properties Our aim is to replace the exact solvers with approximate ones. In order to simplify the notation, in this section we describe the inexact solvers and state some of their properties used in our proofs. Assume that all algebraic systems Ax= f (4.1) with a n x n symmetric and positive (semi)definite matrix A are solved by an application of an iterative method x+-JV!x + Nf (4.2) consistent with (4.1) (i.e., having the same solution), with A-symmetric matrix JV! satisfying (4.3) The following lemma gives a condition number estimate for this procedure. Lemma 4.2.1. Assume that the iteration (4.2) is consistent with (4.1), and that it is convergent in the sense of ( 4.3). Then (1q)(A-1x,x) s; (Nx,x) s; (1 + q)(A-1x,x) Vx E lRn (4.4) and --1 --1 (1q)(N x,x) s; (Ax,x) s; (1 + q)(N x,x) Vx E lRn (4.5) (the constant q can be made arbitrarily small provided sufficiently many steps of iteration (4.2) are used). 93

PAGE 108

Proof. Let us first assume that A is nonsingular. The consistency of (4.2) implies 1H =I-NA. Using Cauchy-Schwarz inequality, we have Thus, we obtain ( 1 -q)lx x)
PAGE 109

'When computing the approximate harmonic extension E, we will apply ( 4.2) started from zero initial approximation in the interiors to the problem (4.9) The following lemma gives an estimate for the case only one iteration of ( 4.2) is used to approximate the harmonic extension Lemma 4.2.2. Let us denote by h the charateristic meshsize. Denoting by E the operator of discrete harmonic extension, we have for the inexact harmonic extension E the following estimate where q is the parameter from ( 4.3). Proof. Let us denote by q : V ---7 the linear operator defined as Qu = { 0 for x E DD, 0 u for xED. Since we assume the iteration (4.2) for problem (4.9) is started from zero approximation ( in the interiors), we have for the initial error e0 = q Eu and after one iteration e1 = j\:J e0 Therefore, using the definition of q and ( 4.3), I I(E-E)u11;1 = 11Qetll;1 = lletll;122 = I IAfeoll;b ::; q2lleoll;122 = q2IIQeoll;1 = In order to conclude the proof, it suffices to realize that I I Q IIA ::; D 4.3 Matsokin-Nepomnyaschikh Abstract Theory The abstract theory presented by Nepomnyaschikh in [73] provides an alternative approach to describing domain decomposition methods, cf. also [55] 95

PAGE 110

R* A B l:l _____ TL:_j R Figure 4.1: Abstract framework scheme. 4.3.1 Abstract Framework and Condition Number Estimate Following [55], let us consider Hilbert spaces H0 H and self-adjoint, elliptic operators A, B on H0 H, respectively. Also consider linear continuous extension and restriction maps T: H0-+H and R: H -+H0 so that RoT= Iu0 (4.10) Consider solving the problem Find u0 E H0 : A'Uo = Lo in Hb (4.11) by the conjugate gradient method with the preconditioner (4.12) The following theorem gives an abstract condition number bound for this process. Since no written reference for the proof of the theorem is known to the author, we include it here for completeness. The proof follows discussion [54] 96

PAGE 111

Theorem 4.3.1 ([73]). For the prec ondi t ion er holds that j( 4 A-1 ) < Cu cone .1vt Ji -0 /j' wit h (ARv,Rv) 2 Cu =sup ( B ) = I IRIIB--tA v Eil V, V 1. (BTv o Tvo) I ITII2 sup A-t B Cr 11oE H ( 1 (Av o vo) Proof. Firs t note that Indeed, by definition of R *, (R*u v) B = (u, Rv)A VuE Ho, V v E H \ R u Bv)LZ( H ) = (Au, Rv)Lz ( H o ) T herefor e (4.13) ( 4.14) ( 4.15) ( 4.16 ) from wh e re ( 4.16) readil y follow s From ( 4. 1 6) and t he definition o f condition number, 1 Amax(RR* ) cond(.1Vi A) = cond(RR ) = Amin(RR*). \Ve have -1 T. A m a x(RB R A ) Amax (RR* ) ( R R v v ) B < sup --'--------'-----1/E H (v, v) B 97

PAGE 112

(ARv Rv) < sup _.__ __ __,_ 11EH (Bv, v ) For the lower bound of the spectrum we have therefore Amin(RR*) = > f (RR*u, u)} Ill 1>E H(1 ('u, 'U) A f \R*'U,R*u) u 1Il 1>E H o (u, u)A inf IIR* u u E H ( J I I U II ;1 . (R*u, v )2 mf sup I 1 2 I 1 2 1EHo v E H I ul AI vi H 2 ( u R v ) inf sup '. ; uEH o vEH 2 (u,'U) 4 mf I 1 2 I 1 2 1 E H o I ul A I Tul H inf (u, u) A 1EHo 1 2 ---*-::; sup 2 = I I TI I A -7B' Amin(RR ) u E H o l l uiiA which concludes the proof. D Note that the operator T appearing in the estimates is introduced for the purposes of the theor y only; it is tied toR by (4.10) but it does not play any explicit role in the preconditioner itself. 4.3.2 An Application: Abstract Additive Schwarz Methods The generality of Theorem 4.3.1 will be demonstrated b y reproving the statement of Theorem 2.1.1 within our current framework. Let v = vo + V 1 + ... + vj Ho = v H = x V1 x ... x vJ 98

PAGE 113

and (., ) denote the L2 inner product. Define operators On H define the inner product bilinear forms and Since J ((vo, ... V J), (wo, ... ,wJ))n = 2-::Jvi, wi)v., i = O a(u,v) = (Au,v) '1/'U,V E V, a i (u, v ) = (Aru, v) Vu, v E Vi= Range (A[) J R:H-+H0 R: (vo, ... ,vJ)-+Lvi. i = O T R v = (Pv0V, Pv1V, ... Pv1v), where Pv is the L2-orthogonal projection onto the space v i and ( '1'" 'I''' ) ( it+ ,l'" i t +'I''' ) .._,o, N J -."clo .vo, ."clJ -"J Finally define operators 'T' : 1 +p it i -."eli v ,."cl. These are the approximate projections, because = (Aw,v) = a( w,v) Vv E v j \Vith these definitions we obtain RTA = L.f= o 7j and we can prove the following theorem. 99

PAGE 114

Theorem 4.3.2 (Dryja, Widlund [30]). Assume that (1) there exists a linear continuous operator T: H0-+H so that T: v-+(v0 ... VJ), RoT= Iu0 ( 4.17) and J 'fJBrvi, vi) s; C5\Av, v) (4.18) i = O (2) there exists a constant w > 0 so that (Au, v) s; w(Biv, u) Vu E Vi, i = 0, ... J. (4.19) ( 3) there exist constanstants Cij, i .i = 1, ... J so that (4.20) Then Proof. \Ve will use the abstract estimate of Theorem 4.3.1. First, 1 Cr Now, let us turn to estimating Cu. Using (4.19) and (4.20), we obtain: Cu = 100

PAGE 115

V"'J I 1 2 . ' 0-i1 : l J i : A . . Subs t1tutmg c = g(s )112 t = ,1 1 the tnvmlmequahty '11(1, .4 (1+ct)2 2 1 + t2 ::; 1 + c y ileds Cu::; w( 1 + g( s )). Thus, from Theorem 4.3.1 we obtain cond(.:\1-1 A) ::; C5w (g( s ) + 1), which wa s to be proved. D It is easy to see that, indeed conditions ( 4 .17) ( 4.18) and (4.20) are identical to (i), (ii) and (iii) of Theorem 2.1.1. 4.4 Unextended Hybrid Schwarz Algorithm Let us set J v = + Evi, i=l A: V-7V, Denote P = P{;}1 the (A-, ) --orthogonal projection onto and = Pv, the (-,}--orthogonal projection onto vi = Range (Zi-i). Adopting the above notation, we may describe the pre conditioner by the following algorithm: 101

PAGE 116

Algorithm 12. Fi>r a given r E V, 1. Solve q E : (Aq, v) = (r, v) Vv E 2. Set 8 = r -Aq 3. Solve U i E vi : \Brui, vi) = (s, v) Vvi E vi 4. Set u = r_f=t ui 5. Solve (A(uu0), v0 ) = (r, v0 ) Vv0 E 6. Output z = .:vt-1r = 'U-'Uo It is easy to see that with the preconditioner described by Algorithm 12 the preconditioned operator will be J .:Vi-1A =(IP) l:::(I(IP)A. i = 1 \Vhen using exact subdomain solvers, step 1. of Algorithm 12 can be performed only once at the start of the iteration and then omitted. This would yield J .:Vi-1 A= (IP) l:::(I(Ii = 1 The following theorem gives a condition estimate: Theorem 4.4.1. Let the following two assumptions be satisfied (i) There exists a linear mapping u E V --t('u0 U1 UJ ), 'Ui E Vi such that J u = uo + 2:: ui and i = 1 J 2:: (Br u i ui) Co(Au, u). i = 1 (ii) For all v1 . ,VJ, ViE Vi and v = r_f=1 V i it holds that J (Au v) C t 2:: (Bi'Vi, vi) i = 1 Then 102 (4.21) ( 4.22) (4.23)

PAGE 117

Proof. \Ve will use the lVIatsokin-Nepomnyaschikh hybrid framework with the following choice of spaces and operators: H0 =(IP)V, J R((vdf=1 ) = l:::(I-P)vi, B = diag(Bi), Tu = (udf=1 i=l (Tu is the decomposition from assumption (4.21).) Then we haveR oT = Iu(P as for a u E (IP)V, (R o T)'u (IP)(u1 + ... + UJ) (IP)(uuo) (IP)'u = u. \Ve can estimate J I I(I-P) 2:: Vi 11;1 i = l J < I l l:: vi11;1 i = l J < C t 2:: (Bi'V i v i). i = l Therefore I IRII B-tA::; ,;v-;. Finally we have J I 12 """' 1 12 1 12 I Tu I B = L I Ui I B; ::; Co I u I A i=l i.e. I ITIIA-tB ::; VfJO. Now ( 4.23) easily follows from Theorem 4.3.1. 0 4.4.1 BDD as a Hybrid Schwarz Algorithm \Ve will show that the original BDD method is a h y brid Schwarz Algo-rithm as described by Algorithm 12. 103

PAGE 118

Let us set J v = + 2:: v;, i = 1 V = FlD ( I -E)V l l l l ll J J = Range (2:: NiDiZi) = 2:: Vi, i = 1 i = 1 A=8, \Vi th this choice of components, steps 1. and 5. of Algorithm 12 are just the preand post-balancing steps as in the original BDD method. Investigating Step 3., we further have: where v = JV D (I E) v. l l l l l Thus or Denoting fii =(IP)fti (which assures solvability) we have 104

PAGE 119

Now we have which is the same as in the Step 2. of the original BDD method. 4.4.2 A New Look at the Conditioning of BDD In Section 4.4.1 we have verified that with proper choice of spaces and operators, the original BDD method can be derived as a hybrid Schwarz method fitting the abstract framework introduced by lVIatsokin and N epomnyaschikh. Therefore, the estimate of Theorem 2.7.1 is valid. \Ve can, however, easily prove the result of Theorem 2.7.1 within our current framework and we do so in the next lemma. Lemma 4.4.2. The BDD algorithm y ields a preconditioner satisfying Proof. In view of the previous section, it suffices to verify the assumptions of Theorem 4.4.1. First, for any u we prove the existence of C0 > 0 so that there exists a decomposition u = uo + r_f=t ui and J L (Bi'ui ui) ::; Co(Au a). i=l Let us define a decomposition of u: Then we have J J "'\'IE ) "i\'7\rD-ISD-17\rT?\rD(I D) .-7\rD(I D) .. -) L \ iUi, U i = L lvi i i i lvi lvi i ri Ui; lvi i ri U i i = l i=l 105

PAGE 120

J J L (Biu i ui) :L (Si(I(Ii = l i = 1 J L (SJ{['U, N[u) i=1 Therefore, C0 = 1. Let us now prove existence of a positive constant C1 so that J J J Vvi E v i (S L V i L vi) ::; c l L (BiVi, v i) i = 1 i = 1 i = 1 By the standard process of subassembly, the definition of vi, and Cauchy-Schwarz inequality we have J J (SLvi,LvJ) i=l j=1 < < concluding the proof. D J J J ( L NtStFl?' L V i L Vj) 1 = 1 i = l j=1 J :L (NtstNrvi, vj) i,j,l= 1 J :L (Ntsd'r{Fl iDi(I-P j )v j ) i,j l = 1 J :L (NtsJ::{Fl iDi(I-NiDi(Ii,/= 1 '"!?_ Lf=, t (R,u,, u,), T;. l = l 1 1 S; l = l q -l'[, Remark 4.4.3. \Ve see that under the assumption that Range = Ker (Si) the condition number is bounded by the same constant as in the original BDD method. 106

PAGE 121

4.5 Extended Hybrid Schwarz Algorithm As we already mentioned in the beginning of Chapter 4, the key to application of inexact local solvers is the formulation of the method in terms of the local stiffness matrices rather than the local Schur complement matrices. This means that the vectors we will use will include the components corresponding to the interior degrees of freedom. As using the term "extended" seems unfortunate to the author in the presence of the operators of discrete harmonic extension, we will use the term long vectors to distinguish these vectors from the ones defined on the interfaces only. 4.5.1 Algorithm on Long Vectors with Exact Components Let V denote the space including interiors, and the space of discrete harmonic functions. F11rther let = Range (lVi). Let us define local spaces the interior spaces and the coarse space o o o o T l/. = 7\r.v = 7\!.7\r (IE)li l 1 ..J. \ 1 l 1 \ 1 1 \ 1 l i = l \Ve now have two sets of local operators, Bi = N iDi 1 8iD;TN{ and the bilinear o o oT o o T form restricted to the interiors: B i = Nu.Vi ANiNi Here E denotes the operator of discrete harmonic extension E : V -7 V, in matrix notation 107

PAGE 122

Since the discrete harmonic extension of a function is uniquely determined by the function's values on the subdomain interfaces, we allow E to be also used as an operator E: V(f)-+ For the sake of convenience of notation, let { V i fori = 1, ... J, F-"i-Vi-J fori= J + 1, ... ,2J. and { Bi for i = 1, ... J, B.;= 0 Bi-J fori= J + 1 ... ,2J. 'With the above definitons, we can write the algorithm on long vectors Algorithm 13. Set q=O. 1. Pre-balance: compute q E (Aq, v ) = (r, v ) Vv E 2. Set 8 = r -Aq. 3. Compute: 0 V'i; i E Vi 4. Set u = "/ =1 (ui + 'iti). 5. Post-balance: compute (A(u-u0 ) v) = (r, v) Vv E V0 6. Output balanced = u-u0 Remark 4.5.1. Again, if all the components of the method are exact, in solving the preand post-balancing steps, the pre-balancing can only be performed once before the iteration commences and omitted in subsequent iterations. 108

PAGE 123

4.5.2 Practical Algorithm on Long Vectors and ACDD Algorithm 13 as written is not very practical, especially because it requires knowledge of the local Schur complements Si contained in the definition of Bi. \Ve will rewrite the algorithm in a form suitable for a computer implementation. Algorithm 14. Set q=O. 1. Pre-balance: solve q E flo: (Aq, v) = (r, v) Vv E f10 2. Set 8 = r -Aq, where 8 = (1) Compute (J Dirichlet solves). (2) s1 A12s. 3. Solve J Neumann problems: [ l [ l 'iii = (Is 11. i = 1, ... 'J. discard 0 with di computed by J Dirichlet solvers i = 1, ... J. 6. Post-balance: compute u0 E flo such that (A(u u0), v) = (r, v) Vv E fl0 7. Output balanced .:Vir = u u0 109

PAGE 124

Remark 4.5.2. The balancing step consists of solving a linear system of algebraic equations By = f, where B is a symmetric positive (semi)definite matrix with entries and a right-hand side has entries fi = Z[D[N{ETr. Remark 4.5.3. For exact subdomain and coarse space solvers s2 = 0 so s = sh s = 0 and we recover Algorithm 12, mathematically equivalent to BDD. Remark 4.5.4. A total of rnJC Dirichlet solves will have to be used m the setup to compute the basis of testing functions used in the construetion of the balancing matrix B, where rn = maxi =I, ... ,J dim( Ker (A i )), C = maxi= I ... ,J card {.i : DDi n DDJ f. 0}. In step 4., the term L}= t U J will be pre computed to be used for all values of i. The cost involved in the computation of steps 1. 2., 3., 4. and 5. is roughly that of solving 2J Dirichlet problems and J Neumann problems. This algorithm, however, is only an intermediate step. The local solves will ultimately be replaced by iterations. \Ve will code-name the method ACDD (standing for the domain decomposition with approximate com ponents). \Ve will write ACDD(oo) to denote the method with all components exact, i.e the method mathematically equivalent to BDD. On occasion, we will write ACDD(k) to denote the method where some or all of the local solves were replaced by inexact ones. The quantity k will be the number of iterations of inexact solvers performed. 110

PAGE 125

4.5.3 Estimate on Long Vectors for Inexact Neumann Solvers In this section we give a condition number bound assuming that the harmonic extensions E and the coarse space solver are exact, but the Neumann solves in step 3. of Algorithm 14 are replaced with an iterative process like ( 4.2), with the rate of convergence given by ( 4.3). Lemma 4.5.5. Using Algorithm 14 with local Neumann problems replaced by consistent iterative process (4.2), with the rate of convergence bounded by q E (0, 1) uniformly with respect to the number of subdomain, then we have Proof. \Ve will apply Theorem 4.3.1 again. As the only component affected by the use of approximate Neumann solves are the operators Bi, t = 1, ... J and we have (S;'x, x) (A,' [ : ] [ : ] ) -[Xllxl -(Ni 0 0 ) = (S t x x), so the operators B1 resulting from the application of iterative solvers satisfy Bi Bi, where the constants of equivalence can be bounded ( cf. ( 4.4)) from above by and from below by For a given u E V, let us set u= EJV-D-(I-P)5tJ. u l l l l r 111

PAGE 126

0 The above defines a decomposition of u into its \:i, and Vi components. Using the definition i \ the identit y N[ENi = I the process of subassembly and 2J 2:: (Brui u i ) < 1 J J 0 2:: (Bifii, fii ) + 2:: (Brui, ui) q i = l i = l i = l 1 -T. T < --L (8i(Iu, (Iu) 1 -q i = l + (NiAiN[(IE)u, (I-E)u)) 1 -(AEu, Eu) + (A( I-E)u, (I-E)u) 1 -q 1 --(Au,u), 1 -q so, C0 = Let v E v and ViE v i be given such that v = Vi Then J J J J J J (A("'. + "'v + ="'(Au v). L 1 L 1 L 1 L 1 L 1' 1 L 1 1 i=l i=l i=l i=l i = l i=l 0 The first term is trivially estimated using identity ( Avi, vi) = (Bivi v i) Using the subassembl y and the definition of and of Schur complement, the second term can be estimated as J 2:: (A v i vi) i=l J 2:: ((NtAtN/')vi vJ) i,j,l= l J 2:: (NtAtN{ENiDi( I -ENjDj(I -Pj)Dj) i,j,l= l J J 2:: 2:: ( Nt8tN/'NiDi(I-N JD J(IPJ)v J) 1 = 1 i,j = l 112

PAGE 127

Application of Cauchy-Schwarz inequality yields J """"' 1Av v) L \ n l i = l L:1 i l iVT z:::!_ Si;D;(I-P;)fr;i12 From here It follows that c l::; (1 +q)sup_ 7 l -1 l : -1 2 S z 0 v;E h L ; = 1 ;1(I-J;)1 1iils, 4.5.4 Inexact Coarse Space and Harmonic Extensions In this section we will investigate the case of Algorithm 14 with inexact harmonic extensions and an approximate coarse space. Two approximate coarse spaces will be considered. The first one is obtained by replacing the exact discrete harmonic extension by one that extends a function from a subdomain interface exactly but only into the adjacent subdomains sharing an edge or face (in 3D) or sharing an edge ( in 2D) with the given subdomain. That is the extension will be zero across corners. This reduces the fill-in in the coarse-level operator. \Ve assume that Algorithm 13 is used with inexact discrete harmonic extension E instead of E, and E is used in the definition of coarse space. Definition 4.5.6. Let E denote the operator of discrete harmonic extension and u be a vector whose components are nonzero only in Di \Ve define operator E as follows: --;: { 0 (Ea) t = (Eu)t where n i n n j is a single point otherwise. 113

PAGE 128

The second approximate coarse space will be based on consistently using the same iterative method for computing the action of the discrete harmonic extension operator throughout the algorithm. Let us first consider the inexact coarse space with reduced fill-in. \Ve will need the following lemma in our estimates. Lemma 4.5. 7. For operator E defined by Definition 4.5.6, it holds that ";: ";: H, (A(E E)w, (EE)w) :::; C(AEw, Ew) :::; C(1 +log( ))2(Au, u), ), Proof. Let us denote Ep1 EQ1 the discrete harmonic extension on the space P1 and Q1 respectivel y From the discrete harmonic extension theorem (\Vidlund [92]) it follows that The first inequality of the lemma follows. The second inequality was proved in [65]. 0 Lemma 4.5.8. Using Algorithm 14 with exact local Neumann solvers, inexact harmonic extensions E and inexact harmonic extension E in the defi-nition of a coarse space as a preconditioner, the condition number is bounded b v '-' Proof. Applying Theorem 4.3.1 again, we define a decomposition of u .. 0 -T. ";: J T into into its vi, Vo and V i : Ui = ENiDi(Iu, Uo = E Lj=l NjDjP)Nj u, 114

PAGE 129

ui = 1V[(I -E)u + 1V[(E-E) 'LJ=1 NjDjP;f'i{u. Using the definition Vi and the identity N[ENi =I, we have i = l i = l i=l i = l J + 2::: ( 1 ViA i1V[((E-E)w +(IE)u, (E-E)w +(IE)u). i=l where w = 'Lf=1 NjDjPJvTu. Thus, using Cauchy-Schwarz inequality several times, Lemma 4.2.2 and the definition of discrete harmonic extension, + (A(EE)w, (EE)w)). Now application of Lemma 4.5. 7 yields C0 ::; C(l + h : )(1 +log( *)2). Let v E v and 'Vi E vi be given such that 'V = Vi Then J J J J (A(l:: vi+ 2::: v i), 2::: vi + 2::: v i ) i = l i = l i = l i = l J J (AE 2::: NiDi ( P;)N/'u E 2::: NiDi( I P;)JV/''U) i=l i = l J J + (A(E E) 2::: N iDi(!P;)N/'u + 2::: vi, i = l i = l J J (EE) 2::: NiDi(!-P;)N/'u + 2::: v i ) i=l i = l J J < (AE 2::: NiDi(!P;)N/'u, E 2::: N iDi( I P;)N['U) i=l i = l J J + 2( (A(EE) 2::: NiDi(!-P;)Ntu, (EE) 2::: N iDi(!P;)N[u) i = l i = l J J + ( 2::: vi, 2::: vi)) i = l i = l 115

PAGE 130

and application of Lemma 4.2.2 yields J J J J (A('"' + "'""'v) + "'""'v) L 1 L 1 'L 1 L 1 i = l i = l i = l i = l The rest of the proof is identical to proof of Lemma 4.5.5 yielding from where the result follows. D Remark 4.5.9. If the same inexact harmonic extension were used for the coarse space as the one used in definition i.e. E = E, then the fill-in in the coarse space matrix is not reduced, but we obtain a better condition number estimate \Ve sum up the results of this section in the following Theorem 4.5.10. If the same inexact harmonic extension were used for the coarse space as the one used in definition i.e. E = E then ( q .)2( H)2 1+(-)2 1+log(-) h h Proof. The proof follows from Lemma 4.5.8 and the condition number estimate for Balancing Domain Decomposition given in Theorem 2. 7.1 0. D 116

PAGE 131

4.5.5 Computational Complexity Let n denote the number of degrees of freedom in the (unreduced) sy s tem; Ne8 be the typical number of elements per subdomain, and d the spatial dimension. H)r the sake of simplicity, w e will analyze the computational complexity only in the two limit cases: the case of implementation on a serial architecture and the case of implementation on an ideal parallel comput e r (i.e., having as many proce s sor s a s we de s ire). \Ve assume that the kernel dimensions of local stiffness matices are uniformly bounded. Also, in the case of exact component s we make an assumption that 1 + is bounded by a constant. In view of The orems 2.7.10 4 .5.10 and 1.4.1 this means that we have to perform only 0 ( 1) iterations to reduce the error to a desired s ize. Under thes e assumptions, let us summarize the computational requirements of these three methods: the original BDD, the method on long vectors with exact components, denoted as ACDD( oo ), and the method on long vectors with all component s approximate denoted as ACDD(k). BDD requires: a
PAGE 132

0( Ne/ ) operations to compute factorizations of local matrices. 2<1-l. Nes ) operations to precompute the coarse space basis functions. O(n) operations to assemble the coarse space problem. a<12 0( ( ) operations to compute the Cholesky factorization of the coarse problem. 2<1-l. 0( ) operations to compute local subdomain solves and harmonic extensions. 2<1-l. 0( ( ) operations to compute the coarse level solution. Balancing the amount of work required, we obtain after trivial calculations that for the serial version of both the BDD and ACDD( oo) methods optimal size 2<12 of subdomains is N e s = nM-4, i.e., in 2D = n 1fJ, resulting in the com-putational complexity O(n413), and in 3D = n4 /11, resulting in the overall computational complexity of O(n49f:n) If a computer with sufficiently many processors were available, the com-putation of a coarse space basis as well as all the subdomain factorizations and solves could be performed in parallel. Then the optimal size of a subdomain would be = n112 in both 2D and 3D, resulting in overall cost of O(n) in 2D and O(n716 ) in 3D. \Ve have shown that the complexity willeven in serial implementation -be lower than just the back-substitution step of a direct method based on a factorization, which is O(n312), O(n5i'3 ) in 2D and 3D, re s pectively. ACDD(k) requires: O(n) operations to setup i t erations for local matrices. 0( Nes) operations to precompute the coarse space basis functions. 118

PAGE 133

O(n) operations to assemble the coarse space problem from the precom-puted basis. () ( ( n,. ) 2 ) l Cl l k f f l 01)erations to com1mte t _; 10 es y actorization o t N .-s -coarse problem. N es) operations to compute local subdomain solves and harmonic extensions. 0( operations to compute the coarse level solution. For ACDD(k), assuming that the number of iterations needed to reduce the error to truncation level is 0 ( 1), we obtain on a single processor optimal size of 2<12 j 4 subdomains Nes = i.e., in 2D = n z and in 3D N;r = n'i, yielding complexity O ( n) in both cases. \Ve have thus proved the following theorem. Theorem 4.5.11. In order to solve the discrete problem to the trun-cation level, by the ACDD(oo) method with exact components on a serial architecture, we need to perform O(n41J) and O(n49/ 'n) operations in 2D and 3D, respectively. Parallel implementation on a computer with about n112 processors reduces these estimates to O(n ), O(n7 / (;) per processor, respectively. For the ACDD(k) method with inexact components, the amount of work on a serial machine is O(n) in both 2D and 3D. 119

PAGE 134

5. Two-Level Multigrid Alternative 5.1 Alternative For Inexact Solvers In Chapter 3 and Chapter 4 we devised overlapping and substructuring domain decomposition methods allowing for use of inexact subdomain solvers. In this section, we describe another way to implement a domain decomposition algorithm avoiding inexact solvers in the traditional sense altogether. Although an independent method, it shares some components with the method proposed in Section 3.2. Both methods are bas e d on the concept of smoothed transfer operator introduced in the current form by et al. in [87 ] Its analysis is based on simple algebraic arguments and is close to that of a two-level multigrid. The input data of the method are the system of linear algebraic equa tions (1.5) with a symmetric positive definite matrix A T resulting from discretization of problem (1.4) on conforming P1 or Q1 finite elements with the mesh T;,,, (5.1) and a system of J closed disjoint subdomains {Di}L1 on D such that each sub domain Di is a closed aggregate of elements. \Ve assume that each node of the underlying finite element mesh belongs to exactly one of these su bdomains. Thus the system covers all the nodes of the finite element mesh on the domain D. Note that we do not require the set of subdomains to cover the entire D. No other input is required for solution of the scalar second order elliptic problems. Simple 120

PAGE 135

modifications necessary for adaptation of the method to problems of elasticity will be noted. In order to eliminate the undesirable influence of possible variation of coefficients in the problem, instead of solving problem (5.1 ), we will formulate our iterative method for the problem Ax= b (5.2) with the diagonally scaled matrix = D-11 2 :i D-11 2 Ii T fir T where D = diag(A,) is the diagonal of A,. This diagonal scaling reduces the spectral condition number of the problem ( cf. [ 40]) and nondimensionalizes the system of equations [ 48]. This last aspect is especially useful when solving problems mixing degrees of freedom having different physical interpretation, such as the problems of plates and shells ( cf. [ 56]). As the new matrix A is independent of the scaling of basis functions, we may assume without loss of generality that the functions of our finite element basis satisfy (5.3) 5.2 Tentative Prolongator and Standard Two-level Multigrid Choosing a prolongator P and some preand post-smoothers Ss, S$, the classic two-level variational method may be written as follows: Algorithm 15 ([45]). For the given initial guess x E IRn, repeat 121

PAGE 136

1. x +-Ss(x, b), 2. solve the coarse level problem pTAPv = PT(Axb), 3. x +xPv, 4. x +-SS'(x, b) until convergence; An appropriate choice of the components, most importantly the prolongation operator P, is the key to the efficiency of the two-grid method. Let us first investigate the method with a prolongator given by unknowns aggregation. Using a system of nonoverlapping subdomains and the aggregation technique ( cf. Section 3.2), we will construct a tentative coarse space of possibly a very small dimension. \Ve label it tentative for the same reasons as we did in Chapter 3 namely because we will ultimately construct an improve-ment based on this tentative one. Let Nn denote the number of nodes in the discretization mesh T, F the index set of all unconstrained nodes and N 1 = card(F). \Ve introduce a one-to-one mapping N : {1, ... N 1 } -t F that establishes the correspondence between the degrees of freedom of the constrained space V r and its unconstrained counterpart V; i.e., for a degree of freedom i in N(i) is the number of the corresponding degree of freedom in V. Let us consider the decomposition of unitv u1 E ffiN" defined bv N L uftpi = 1 on D. (5.4) i = l Note that in practice the finite element bases for solving second order elliptic problems often satisfy uf = 1 i = 1, ... Nn. As noted, the subdomains Di do 122

PAGE 137

not cover the entire D, but since they contain all the nodes of the mesh, we can define a N 1 x J tentative prolongator matrix P by the following construction: { if the node VN(i) belongs to subdomain Ujl otherwis e ; ( r r) ;).;) Note that this version of tentative prolongator differs from the one us e d in Section 3.2 onl y by the diagonal scaling. The following lemma summarizes several useful r e sults well known from the theor y of the two-level method [67] [ 45 ] As these are classic results, we shall omit their proof. Lemma 5.2.1. Let B be a symmetric positive definite matrix on lRn, p: lRn-+lRm a full-rank prolongation operator and T = Ker (r/'B). Then (1) I p (r? 'Bp)-1pTB is an B-orthogonal projection onto T, (2) If 8 = I w E ( 0 2) then for every x E lRn 2 2 2 w l l8xll u ::; l lxllu I IBxll 0 (!(B) 0 (2w ). (3) I J 8 [ I p (r/'Bp)-1 pTE] llu ::; liST l in, where Sr denotes the restriction of operator 8 to the set T. ( 4) Let the following weak approximation property be fulfilled : there exists a constant Capx > 0 such that for every u E lRn there exists a v E lRm such that (5.6) 123

PAGE 138

Then I !Bxll 1 l l x!IB 2: capx e(B) 2 for every X E T. (,.. -) 0. ( Corollary 5.2.2. If we choose the damped Jacobi method I !?0'W4 as the post-smoother in the two-level Algorithm 15, we obtain Remark 5.2.3. Note that the constant Capx typically depends on the ratio of the sizes of fine and coarse space, H /h. The following theorem summarizes the convergence properties for the two-level method with the prolongator given by unknowns aggregation. Theorem 5.2.4. For the two-level method given by Algorithm 15 with the prolongator P = P given by a mere unknowns aggregation (5.5), and the damped Jacobi post-smoother, applied to solving the problem ( 5.2), the error estimate is Proof. In view of Corollary 5.2.2 we only need to evaluate the constant m the weak approximation property (5.6). Let II denote the finite element interpolation operator: for a vector o:, nu = L cxd > i \Ve will construct a vector v having components 124

PAGE 139

From the definition (5.5) of the aggregation prolongator Pit follows that (Pv)j = Vi, for every node Vj E ni. Using the equivalence of Euclidean and continuous L2 norms, we have J ;:::; E I I IT (u vi) I I J J z (ll;) i = l so the Poi inequality ( A.4) yields 2 2 2 II'U-Pull < h,d I IHullul(!l;)H 2 (H) I 12 < C h h I Hul Hl(n) < C ( 2 The rest of the proof follows from Corollary 5.2.2. D Computational experiments suggest that the estimate stated in Theorem 5.2.4 is sharp. This fact had lead to a commonly accepted point of view that an algebraic method utilizing only two grids cannot be a basis of an efficent solver offering simultaneously low computational complexity and the rate of convergence independent of the ratio H /h. Indeed, in order to preserve favorable rate of convergence, the estimate of Theorem 5.2.4 excludes the possibility of having a two-level method with a small coarse problem. This means that solving a problem using two-level method under this limitation results in asympotically the same complexity as solving the original problem by a direct solver, a procedure hardly worth the effort of 125

PAGE 140

implementing. In the next section we will show how to modify the two-level method to defeat this difficulty. 5.3 Modified Two-level Multigrid and MLS In the previous section we have s e en that the convergence of the algorithm based on the aggregation of unknowns alone is unsatisfactory Because of other practical advantages aggregation has to offer, various attempts to improve the convergence have been made (e.g., supercorrection [6, 7]). These attempts did not, however, solve the cause of poor behavior inherent to the problem, which is the fact that the range of the prolongation based on aggregation consists of functions with high energy, namely the piecewise constant functions. This section describes a modified two-level substructuring method, the result of joint work with Jitka KHzkova, Radek Tezaur and Petr [88], for solving scalar elliptic problems with jumps in coefficients. \Ve improve the convergence estimate obtained from the standard two-level multigrid theory using an algebraic lift made possible by using the multilevel smoothing in the definition of the prolongator operator. As we aim at achieving good computational complexity, reducing the dependence of the rate of convergence on the coarse space size becomes our main objective. To this end we will tailor our pre-smoother, post-smoother and coarse space so that the two-level method based on their combination is capable of effectively eliminating all components of the error, provided that the pre-smoother distributes the information to the distance comparable with the 126

PAGE 141

coarse space resolution. \Ve will utilize the tentative prolongator constructed in (5.5) and apply to it a specially selected prolongator smoother. This will produce a smoothing effect on the range of the tentative prolongator. Our design choice is to take the linear part of the pre-smoother to play the role of prolongator smoother. Our post-smoother will also be derived from the pre-smoother. This results in a flexible procedure which we can control by a choice of the pre-smoother. \Ve will prove under regularity-free assumptions that proper selection of the presmoother results in the rate of convergence independent of the size of the coarse space. The method is determined by two components : a full-rank tentative prolongator P : mm ---7 mn (m << n = rank(A)) and a pre-smoother from which both the post-smoother and a prolongator smoother are derived. \Ve will show that the proposed method is, up to a post-processing step, the stan-dard variational two-level scheme with a smoothed prolongator SP and special smoothing procedures. Let x +---Ss(x,b) be a given pre-smoother, a linear iterative method consistent with (5.2), with a symmetric linear part S commuting with A. \Ve select S to be our prolongator smoother. Let us define the post-smoother to be a linear iterative method consistent with (5.2) such that its error propagation operator is the matrix S' defined by w S' = I ( / ) As, [! .'"iS where -. 82 : 1 .'"iS -1cl. (5.8) Scalar w E (0, 2) is a given parameter and g(As) is an upper bound of the spectral radius e(As). Note that all the above can be easily accomplished if S 127

PAGE 142

is a polynomial in A. The algorithm can b e written down as follows: Algorithm 16 (MLS). f()r the given initial guess x E lRn repeat 1. x +-Ss(x,b), 2. solve the coarse level problem PTSASPu = prs(Axb), 3. x +x SPu, 4. x +-SS'(x, b) until convergence; 5. Post process x +-Ss(x,b). Remark 5.3.1. a) Steps 14 of the algorithm form the standard multi grid two-level method given by prolongator SP and smoothers Ss, Ss' (cf. the proof of Theorem 5.3.5 below). For such an algorithm, our theory gives an er ror estimate in the As-seminorm. However, using a smaller coarse space calls for a more powerful smoother S to obtain the optimal convergence result, so As depends on the coarse space, and the practical value of such an estimate is questionable. b) From the convergence point of view, the postprocessing step 5 con sisting of a single smoothing seems out of place. But it is the addition of this very step that allows us to provide the same convergence estimate in the energy norm of the original problem (5.2). In order to demonstrate this let e i denote the error after i iterations given by steps 1 through 4 of Algorithm 16, and ef the error after the application of the postprocessing step 5 to ei, i.e., ef = Sei Then, as As = S2A, e(S) < 1, we have lleoiiAs <;:; lleoii A and llefii A = l l eiiiAs 128

PAGE 143

The last two relationships yield the S-independent convergence rate estimate (5.9) provided we can estimate (5.10) Remark 5.3.2. In Chapter 6, we will refer to the practical implemen-tation of the method from Algorithm 16, with suitable choice of the components described below as lVILS, an abbreviation standing for Black-box Nonoverlapping lVIethod with lVIultilevel Smoothing. The following assumption specifies the requirements on S p and g(As) in the form suitable for our purposes. Assumption 5.3.3. There exist positive constants C1 C2 independent of rn, n, and constant Cv(m, n) such that: 1. There is a mapping CJc : IRn -+ Range (P) such that ] I(I-CJc)u!l CtCv(rn n)rF2(A)! Iu!I A VuE IRn. (5.11) 2. The prolongator smoother S is symmetric, connnutes with A, satisfies 12( S) 1 and the smoothing property of Hack busch in the form (s2 < -(s2 < c2c-2 ( " ) ( '') 12 Ii 12 Ii _,.2 -' D m, H 12 Ii (5.12) Remark 5.3.4. \Ve note that (5.6) follows from (5.11). Adding the requirement (5.12) is necessary to guarantee improved convergence. A pair of requirements similar to (5.11), (5.12) is very common in the multigrid theory [45, 14]. In our estimates, the purpose of the constant Cv(rn, n) is to absorb 129

PAGE 144

the dependence of the estimates (5.11) and (5.12) on dimensions m, n. \Vhen verifying the assumption above, the goal is to show that constants C1 and C2 are either m, n -independent or depend on m and n only weakly ( e.g. poly logaritmically.) As the convergence rate estimate will turn out to depend only on the constants C1 C2 it will thus be m n indep e ndent. Let us recall Algorithm 6 of Section 3.2, where we have constructed the prolon-gator smoother S to be a polynomial in A satisfying (Lemma 3.3.2) ( s2 '1) c ( '1) (! ""i ::; 2 (! ""i deg (S) Typically, for second order elliptic problems it is possible to prove the weak approximation property in the form where h is the fine space characteristic resolution ( mesh s ize) and H is the tentative coarse space resolution (we have obtained this result in the proof of Theorem 5.2.4 ). Then, when choosing S of degree at least CHI h, we can set Cv (rn, n) = HI h and the constants C1 and C2 are independent of H h, enabling us to prove coarse space size independent convergence. Let us recall that w is the damping parameter from the definition (5.8) of S. \Ve now formulate the abstract convergence estimate. Theorem 5.3.5. Let ei denote the error after i iteration s given by steps 14 of Algorithm 16 and let ef be the error ei smoothed by step 5. Then under Assumption 5.3.3, we have the following error estimate: (5.13) 130

PAGE 145

where w E (0, 2). (5.14) Proof. In view of Remark 5.3.1 b), it suffices to prove the estimate ( r 1r) (). 0 for the error of iterations without the final smoothing step. As the first step in this proof, we will adopt a different view of our method, namely we will demonstrate that our two-level method with the smoothed prolongator sP, pre-smoother s and post-smoother s' applied to the problem with the matrix A can be viewed as a two-level method with tentative prolongator P applied to the problem with a "smoothed" matrix As = AS2 First define Qs : mn ---7 Ker ( S)to be the orthogonal projection with respect to the Euclidean inner product. Note that S may be singular. Since S commutes with A, the eigenvectors of S and As coincide. Consequently CJs is As-symmetric and IIQs liAs = 1. It is routine to derive that the linear part of the steps 1 4 is given by where (PrAsP)+ is a pseudo-inverse of pTAsP. Since Ker (PTAsP) = {x E lRm: Px E Ker (S)}, the algorithm is independent of a particular choice of the pseudo-inverse. Thus, the method can altematively be viewed as a standard two-level method for solving a problem with matrix As (in place of A) and prolongator P (in place of 131

PAGE 146

SP.) Adopting this view will aid us in estimating the right hand side in (5.16) in the As-operator matrix norm. Let us define T = Ker (P'i'As) n Ker (S)and consider the As-orthogonal decomposition Ker (S)= T EB Range (CJsP). Consider the coarse level correction part P of the error propagation operator on the right-hand side of (5.16) It is easy to see that P restricted to Ker ( S)-is an As-orthogonal projection and Range (P) CT. Further, as S commutes with A, S commutes with S. Therefore, taking into account that g(S) ::; 1 g(S) ::; 1, IICJsiiAs = 1 and I I P 1 Ker (S)\ IAs = 1 we have (5.17) The rest of the proof consists in demonstrating that at least one of the expressions in the minimum above is bounded by 1 c l for any X E T \ {0}, with defined by (5.14). 132

PAGE 147

\Ve will now show that for our definition of S the following algebraic lift holds: (5.18) where x Range (P) c T is an error lying in the range of the coarse grid correction operator. In other words we will show that if (for the particular x E T) This lift guarantees for a particular error x E T that if S is inefficient in reducing the error x (IISxiiAs/llxiiAs q q E (0, 1]), then S will be efficient in the sense that Let us define By Assumption 5.3.3, we have f J 1. Simple manipulations yield 11Sxll;1s = II(I= II(I -;LJ-1(As) As)xll;1s llxll;,s-'(AsJIIAsxll' + 1(As)) 2IIAsxll;,s < -(As)IIAsxll2 + 2 q 1 (As)IIAsxll2 11x11;1s [1(2[J-1(As) Consequently implies (5.19) Let us recall that As = SAS, and S commutes with A. Therefore we obtain (5.20) 133

PAGE 148

1 1 Asxl l2 1 1 Asxi i2!IS2xl l71 I IAS2x W !1Sx!l71s llxll71s IIS2xll71 llxll71s IIS2xl l71 (5.21) Now, consider x E T c Ker (PTAS2). Then, setting u = S2x, we have u E Ker (PTA) = Range (P)-4 where -A. denotes the A-orthogonal complement of a set. f()r operator CJc satisfying the weak approximation property (5.11), we estimate the ratio IIAS2x W / I I S2xll; 1 using the standard orthogonality argument: II'UII;1 \A'U, 'U) \A'U, u CJcu) < !IAull!lu -CJcull < C 1 Cv(m, n)g-1i2(A)IIA'UIIIIuiiA, which in view of (5.20), means that Substituting this estimate into ( 5.21) using Assumption 5.3.3 and the definition of {J, we obtain IISxl l;1s l l x!l;1s completing the proof of (5.18). 134

PAGE 149

Finally, due to (5.16), (5.17), (5.18) and IISxiJ ;1s/IJxiJ;1s E [ 0, 1], we may write I JS'[I-I IS'SCJs[ I < s1;p min {a, 1a(C1C2)-2w(2-w)}. nE[O,I] The expression on the right hand side is bounded by [ 1 + (C1 C2)-2w(2-w )]-1 which completes the proof of the theorem. D 5.4 Practical Issues 5.4.1 Generalizations Although we have formulated the method for scalar problems, it is not difficult to generalize it for solving nonscalar ones, such as 2D and 3D elasticity or the plate and shell problems. This can be done in the same manner as described in Section 3.4.2, with only one difference, namely that we now need to add the (block) diagonal scaling. \Ve note that lVILS was formulated with second order problems in mind. Its use for solving higher order problems, although possible, is unwarranted by the theory. From the practical point of view, the smoothing procedure based on matrices resulting from discretizations of higher order problems would generate too extensive overlapping of the smoothed coarse space basis function supports and produce rapid fill-in in the coarse problem. A true algebraic multigrid using the concept of smoothed transfer operators, suitable for solving higher order problems, was proposed in lVIandel and Brezina [ 87]. 135

PAGE 150

5.4.2 Computational Complexity The following theorem give s the computational complexity e stimate for the two-level method with smoothed prolongator implemented on a serial anhitecture. Theorem 5.4.1. Let the assumptions of T heorem 5.3.5 be fulfilled and the Cholesky factorization be used to sol v e the coarse-level probl e m. Then the optimal number of element s per subdomain using IVILS in 3D is Ne8 ;=:j n1i2 and the s ystem (5.2) can be solved to the level of truncation error in O(n7i6 ) operations. In 2D the optimal number of elements per s ubdomain is Ne8 ;=:j n2i5 and the s ystem ( 5.2) can be solved to the level of truncation error in O ( n6i5 ) operations. Proof. The setup phase requires operation s to construct the smoothed prolongator P = SP, O(n) operations to con struct the coarse level matrix pTAP, and 0( operations to compute the Cholesky factoriza tion of the coarse matrix. Each iteration will require operations to perform the smoothing O(n) operations to evalua t e the prolongation, r e striction, and compu t e the 2d-l defect and to compute the back-substitution. T heorem 5.3.5 guarantees that the number of iterations needed to converge to a required precission is 0( 1). rviinimizing the ov erall exp e nse, we obtain the e s timates of the theorem. D Although lVILS can take advantage of parallel implementation and the iteration time can be reduc e d dramatically its lack of locality of data prevents asymptotic improvements in computational complexit y which remains O(n7i6 ) 136

PAGE 151

operations in 3D and O(n(;/5 ) operations in 2D. Despite this Theorem 5.4.1 shows that of all the methods we have considered lVILS has the lowest computational complexity. In 3D, lVILS' asymptotic computational complexity equals that of the massivel y parallel implementations of both BOSS (Theorem 3.4.2) and ACDD(oo) (Theorem 4.5.11). 137

PAGE 152

6. Numerical Experiments \Ve now present the results of numerical experiments. \Ve begin with a rather academic set of problems and preceed to the practically more interesting problems on "real-world" geometries. In all the problems below the methods BDD, ACDD, BOSS and lVILS described in the text are applied as a preconditioner .:vt-1 in the Orthomin implementation of conjugate gradient method (Algorithm 2). The stopping criterion used was where E = 1 o-(> for all the problems except when explicitly specified otherwise. Note that this criterion guarantees the energv norm error estimate i1lei,l1 1 < E (cf. < : x: 4 -Section B.1; notes on estimating the condition number can also be found there). For comparison of the methods, we report the condition number es-timate for the preconditioned system and the number of iterations required to satisfy the stopping criterion. The CPU times are not reported, as this information could be misleading due to varying maturity of the code for the methods. Before presenting the results of the tests, let us note that the weight matrices Di used in ACDD Algorithm 14 were chosen to be the diagonal entries of local stiffness matrices A i corresponding to the interface degrees of freedom. 138

PAGE 153

6.1 Model Problems The original BDD method is known to be very robust with respect to variation in the coefficient values of problem (1.2). \Ve have conducted a series of experiments in an attempt to determine how well will this robustness be preserved with respect to changing number of iterations used to approximate the exact local subdomain inverses used in ACDD. \Ve denote the method with approximate local solvers by ACDD(k), where k is the number of iterations performed on each subdomain. As we are interested in the behavior with respect to varying coeffi cients the problem was set up on a simple cubical domain subdivided into 125 subdomains with uniform discretization mesh. The global number of degrees of freedom in the discretized system was 68921. The iteration we chose for approximating the local inverses was SSOR. Implementation of an algebraic multigrid procedure to replace SSOR is currently in progress; it is expected to yield much improved results. In the first set of experiments we solve the weighted Poisson equation on the given domain. The jumps in coefficients follow the checkerboard pattern depicted in Figure 6.1. Table 6.1 contains the number of iterations required to reduce the relative residual to c: = 1 In the case of Poisson equation (o:1 2 = 1), if only one iteration of SSOR is performed the method converges in 23 iterations. This is an almost fivefold increase compared to BDD but still a significant improvement over the diagonally preconditioned conjugate gradient method, which failed to converge in 300 iterations. 139

PAGE 154

Table 6.1. Comparison of BDD with ACDD(k) for different values of k in case of the checkerboard coefficient pattern. The numbers of iterations required and condition number estimates O:'t,2 = 1 (li,2 = 10 1 ) 2 O't,2 = 10 O't, 2 = 10:3 IVIethod used Iter. Con d. Iter. Coll(J. Iter. Coll(J. Iter. Coll(J. BDD 5 1.18 11 3.07 10 3.06 10 3.05 ACDD(200) 5 1.18 11 3.07 10 3.06 10 3.05 ACDD(150) 6 3.83 11 3.07 10 3.06 10 3.05 ACDD(100) 7 3.89 12 3.07 11 3.06 11 3.05 ACDD(50) 10 3.89 12 3.08 11 3.05 11 3.04 ACDD(40) 11 3.90 12 3.08 12 3.06 11 3.02 ACDD(30) 12 4.11 13 3.14 12 3.09 11 2.97 ACDD(20) 12 4.20 13 2.27 12 3.17 12 3.04 ACDD(10) 13 4.42 14 3.73 13 3.40 12 3.18 ACDD(8) 14 4.53 15 3.96 13 3.50 12 3.20 ACDD(5) 15 4.88 16 4.60 14 3.74 12 3.26 ACDD(4) 15 5.18 17 4.95 14 3.85 13 3.29 ACDD(3) 16 5.75 18 5.47 15 4.01 13 3.34 ACDD(2) 18 6.87 20 6.64 16 4.49 14 3.74 ACDD(1) 23 9.37 25 9.40 19 5.81 17 5.24 140

PAGE 155

Table 6.2 contains the results of solving a problem with randomly distributed coefficients in a given range. The distribution of coefficients is expo nential. The results for coefficients in the intervals [10-1 101], [10-2 102 ] and [ 10-'3, 103 ] are reported. In these experiments we observe that for coefficients in [10-1 101], we recover convergence properties of BDD if we perform about 10 it erations of SSOR on each subdomain. For coefficients in the interval [10-2 102 ] we see a similar pattern. Note that the condition number estimates in this case speak in favor of the application of approximate subdomain solvers. If only one iteration on each subdomain is performed, the number of iterations needed to reduce the relative residual to 10-( ; is less than twice that of BDD. For large variation in the coefficients, the convergence properties of ACDD() appear to be superior to those of BDD. Table 6.3 contains the results of solving a problem with randomly distributed coefficients in a given range. Here the distribution of coefficients is uniform. The double-columns lists the results for coefficients in the intervals [10-1 101], [10-2 102 ] and [10-3 10'l], respectively. In thes e experiments, we observe a similar behavior in all three ranges of coefficients. \Ve recover conver gence properties of BDD if we perform only a few iterations of SSOR on each subdomain. If only one iteration on each subdomain is performed, the number of iterations needed to reduce the relative residual to 1 o-G is less than twice that needed for BDD. 141

PAGE 156

Table 6.2. Comparison of BDD with ACDD(k) for problem with exponentially distributed random coefficients. The numbers of iterations required and condition number estimates Jumps 10-1 101 Jumps 10-2 -102 Jumps 10-
PAGE 157

Table 6.3. Comparison of BDD with ACDD(k) for problem with uniformly distributed random coefficients. The numbers of iterations required and condition number estimates .Jumps 10-1 101 Jumps 10-2 -102 .Jumps 10-:J 10J lVIethod used Iter. Coll(J. est. Iter. Con d. est. Iter. Coll(J. est. BDD 16 4.38 13 3.25 11 2.59 ACDD(200) 16 4.38 13 3.25 11 2.59 ACDD(150) 16 4.38 13 3.25 11 2.59 ACDD(100) 16 4.38 13 3.24 11 2.60 ACDD(50) 16 4.35 13 3.22 12 2.64 ACDD(40) 15 4.34 13 3.21 12 2.69 ACDD(30) 15 4.34 13 3.23 12 2.79 ACDD(20) 15 4.40 13 3.35 12 3.02 ACDD ( 10) 15 4.77 1 4 4.03 13 3.49 ACDD(8) 15 4.98 14 4.33 13 3.63 ACDD(5) 17 5.92 16 4.98 14 3.89 ACDD(4) 18 6.52 16 5.29 15 4.00 ACDD(3) 19 7.37 17 5.71 16 4.21 ACDD(2) 22 8.77 19 6.37 18 5.09 ACDD(1) 26 11.97 23 8.15 22 7.56 143

PAGE 158

Table 6.4. Comparison of BDD with BOSS and lVILS for problem with coefficients jumps in a checkerboard pattern formed by 125 subdomains. Coarse spaces of dimensions 2744 and 125 were used for lVILS and BOSS. Prolongation smoother of degree 1 was used, and lVILS used 2 pre-smoothers, 2 post-smoothers. lVIethod BDD BOSS BOSS lVILS lVILS CS dim. 125 125 2744 125 2744 It. Com1. It. Cond. It. Cond. It. Con(J. It. Cond Poisson 5 1.18 8 2.35 5 1.10 9 2.93 5 1.21 11 3.07 12 3.01 5 1.12 14 4.31 6 1.27 10 3.06 12 3.11 5 1.13 15 4.49 6 1.29 10 3.05 12 3.11 5 1.13 16 4.50 6 1.29 The results of our numerical experiments confirm the theoretically predieted convergence of the method. \Ve observed that the method using approxi-mate subdomain solvers mimics the desirable properties of BDD. Using approxi-mate local solvers, while keeping the asymptotic cost of each local solve the same as for BDD, provides for considerable savings in the setup phase, as with the iterative subdomain solvers, there is no need for expensive matrix factorizations. Table 6.5 displays the results of a comparison of BDD with BOSS of Chapter 3 and rviLS of Chapter 5. Prolongator smoother of degree 1 was used in both BOSS and lVILS. The domain was subdivided into 125 subdomains and the coefficients of the problem were generated randomly, with a uniform distribution. All three methods performed well for all ranges of coefficients. The results suggest BOSS to be least sensitive of the three with respect to the variation of the coefficients. Table 6.6 displays the results of a comparison of BDD with BOSS and rviLS methods. Prolongator smoother of degree 4 was used. The domain was 144

PAGE 159

/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / v v v v / v v v / / v v v / / v v v / / v v v / Figure 6.1. The checkerboard coefficient pattern. Dark su bdomains correspond to values o:1 the light ones to o:2 Table 6.5. Comparison of BDD with BOSS and lVILS for problem with uniformly distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 1 was used, and lVILS used 2 pre-smoothers 2 post-smoothers. Range of BDD BOSS lVILS .JUmps Iter. Con d. Iter. Con d. Iter. Con d. Poisson 5 1.18 8 2.35 9 2.93 10-1 -101 16 4.38 8 2.50 10 3.14 10-2 102 13 3.25 8 2.32 11 3.14 10-3 -103 11 2.59 8 2.27 11 3.16 145

PAGE 160

Table 6.6. Comparison of BDD with BOSS and lVILS for problem with uniformly distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 4, and 4 pre-smoothers and 4 post-smoothers were used in lVILS. Range of jumps BDD BOSS lVILS Iter. Con d. Iter. Con d. Iter. Con d. Poisson 5 1.18 4 1.11 6 1.66 101 16 4.38 untest untest 6 1.45 102 13 3.25 untest untest 6 1.44 10 3 11 2.59 untest untest 6 1.45 subdivided into 125 subdomains and the coefficients of the problem were generated randomly, with a uniform distribution. \Ve may observe an improvement in the convergence gained by employing a more powerful prolongation smoother. Tables 6. 7 and 6.8 display the results of a comparison of BDD with BOSS and lVILS for problem with exponentially distributed random coefficients and prolongator smoother of degree 1 and 4, respectively. The domain was subdivided into 125 subdomains. Both BOSS and l\!JLS appear to be more stable with respect to the random variation in the coefficients and, as before, using a more powerful prolongator smoother results in more robust iteration. Tables 6.9 and 6.10 present the comparison of results of application of BDD, BOSS and MLS to a problem subdivided into 2,7444 subdomains, with uniformly and exponentially distributed random coefficients in different ranges. Prolongator smoother of degree 1 was used. \Ve see that for uniforml y distributed coefficients both BOSS and MLS are very robust resulting in condition numher estimates not exceeding 1.20. For the exponentially distributed coefficients, we observe deterioration of performance for MLS, reaching condition number estimate of about 24. 146

PAGE 161

Table 6. 7. Comparison of BDD with BOSS and lVILS for problem with ex ponentially distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 1 was used for both BOSS and lVILS, and 2 pre-smoothers and 2 post-smoothers in lVILS. Range of jumps BDD BOSS rviLS Iter. Con(1. Iter. Con d. Iter. Con(1. Poisson 5 1.18 8 2.35 9 2.93 10-1 101 16 4.38 9 2.92 12 4.08 10-2 -102 13 3.25 11 3.67 20 10.36 10-'3 -103 185 2247.8 12 4.52 39 46.99 Table 6.8. Comparison of BDD with BOSS and lVILS for problem with ex ponentially distributed random coefficients and coarse space of dimension 125. Prolongation smoother of degree 4 was used for BOSS and lVILS, and 4 preand postsmoothers in lVILS. Range of jumps BDD BOSS rviLS Iter. Con d. Iter. Con d. Iter. Con d. Poisson 5 1.18 4 1.11 6 1.66 10-1 -101 16 4.38 untest untest 6 1.59 10-2 -102 13 3.25 untest untest 13 4.87 10-3 -103 185 2247.8 untest untest 32 46.33 Table 6.9. Comparison of BDD with BOSS and lVILS for problem with uniformly distributed random coefficients and coarse space of dimension 2, 7 44, degree 1 of prolongator smoother, and 2 preand postsmoothers in rviLS. Range of jumps BDD BOSS rviLS Iter. Con(1. Iter. Con d. Iter. Con d. Poisson 7 1.52 4 1.15 4 1.18 10-1 -101 11 2.51 4 1.05 4 1.17 10-2 -102 11 2.71 4 1.05 5 1.18 10-<)-10') 11 2.71 4 1.15 4 1.18 147

PAGE 162

Table 6.10. Comparison of BDD with BOSS and lVILS for problem with expo nentially distributed random coefficients and coarse space of dimension 2 7 44, Prolonation smoother of degree 1 was used, and 2 pre-smoothers and 2 post smoothers in rviLS. Range of jumps BDD BOSS lVILS Iter. Coll(J. Iter. Con d. Iter. Con d. Poisson 7 1.52 4 1.15 4 1.18 101 18 5.24 4 1.11 5 1.40 102 90 124.12 6 1.47 12 5.03 10 '3 540 5,513.4 8 1.93 26 24.08 6.2 Real World Problems \Ve have also applied the BOSS and rviLS method to several real-world problems. The data for these tests was made available by Charbel Farhat of the Aerospace Engineering Department, University of Colorado at Boulder. Table 6.11 presents results for a shell problem of automobile wheel (Fig ure 6.2) discretized by 3 node shell elements with 6 degrees of freedom pre node. The discrete system obtained by discretization had 59490 degrees of freedom. Prolongator smoother of degree 1 was used in both BOSS and lVILS. Two sizes of coarse problem were tested. Good convergence was observed for both BOSS and lVILS for the coarse space of 1,260 nodes. For the small coarse space of 158 nodes, the condition number of the problem increases to 57.60 for BOSS and to 77.71 for lVILS. This can be explained by the fact that the ratio of fine space coarse space resolutions is large here, so prolongator smoother of degree 1 cannot guarantee convergence independent of this ratio. The degree of the prolongator smoother would have to be increased. Table 6.12 presents results for the 3D elasticity problem discretized on 148

PAGE 163

Table 6.1 1. Comparison of BOSS and lVILS for solving the sh ell probl e m on a mesh discretizing an automobile wheel with 9,915 nodes and 59,490 degrees of freedom. Prolongator smoother of degree 1 was used, and 4 pre-smoothers and 4 post-smoothers in lVILS. lVIethod used BOSS rviLS Nodes lil coarse space Iter. Con d. Iter. Cond. 1260 13 5.42 13 6.03 158 37 57.60 40 77.71 Figure 6.2. The mesh of the automobile wheel (data courtesy of Charbel Farhat, University of Colorado at Boulder). 149

PAGE 164

Table 6.12. Comparison of BOSS and lVILS for solving the 3D elasticity problem with 25,058 nodes and 75,174 degrees offreedom, Prolongator smoother of degree 1 was used and 2 pre-smoothers, 2 post-smoothersin lVILS. lVIethod used Nodes in coarse space 1438 97 the mesh of Figure 6.2 by tetrahedra elements. The discrete system obtained by discretization had 59,4 90 degrees of freedom. Prolongator smoother of degree 1 was used in both BOSS and lVILS. Two sizes of coarse problem were tested. Good convergence was observed for both BOSS and lVILS for the coarse space of 1,434 nodes. f()r the small coarse space of 97 nodes, the condition number of the problem increases to 8.64 for BOSS and to 19.68 for rviLS. Although these are still good results, we can see that aplication of prolongator smoother of degree greater than 1 may be appropriate for this coarse space. \Ve note that the BOSS method is more robust with respect to the variation of the coarse space size, which can be explained by its link to overlapping Schwarz methods. Table 6.13 presents results for the shell problem discretized on the mesh of a propeller (Figure 6.2) by 8 node brick elements. The discretized system had 123,120 degrees of freedom. Prolongator smoother of degree 1 was used in both BOSS and lVILS. 150

PAGE 165

Figure 6.3. The mesh of the solid with tetrahedron elements (data courtesy of Charbel Farhat, University of Colorado at Boulder). Table 6.13. Comparison of BOSS and lVILS for solving a shell problem: a pro peller with 41,040 nodes and 123,120 degrees of freedom. Prolongator smoother of degree 1 was used, and 4 pre-smoothers, 4 post-smoothers in f\tiLS. Method used BOSS MLS Nodes in coarse space Iter. I Cond. Iter. I Cond. 173o 1 16 1 2o.52 1 n 1 14.19 1 151

PAGE 166

Figure 6.4. The mesh of the turbine with 123,120 elements (data courtesy of John Abel, Cornell University). 152

PAGE 167

7. Conclusions This study was motivated by the need to solve large algebraic systems arizing from discretizations of three-dimensional problems, and by the intolerable asymptotic complexity growth for traditional direct solvers. \Ve have presented three methods for efficient solution of such large sparse systems. Convergence results for these methods have been proved suggesting very good performance. For ACDD, method of Chapter 4 a condition number bound proportional to ( 1 +log( H /h) )2 has been established. The BOSS method of Chapter 3 and lVILS method of Chapter 5 allow certain level of adaptivity, achieved by using intergrid transfer operator smoothed by a polynomial smoother. Increasing the amount of smoothing has a positive effect on the convergence properties, but it increases computational complexity, as with increasing degree of the smoothing polyno mial the overlapping of our subdomains increases. In case of the BOSS method, this allows us to move on the scale between domain decomposition without an overlap and domain decomposition with large overlapping, depending on the size of the coarse space and the difficulty of the problem to be solved. The use of smoothed transfer operators also resolves the issue of generating a decomposition into overlapping subdomains. The method starts by generating a nonoverlapping covering of all nodes of the mesh by a simple greedy algorithm and achieves the desired amount of overlapping by applying a smoothing polynomial of appropri ate degree. 153

PAGE 168

This feature is also shared by the lVILS method, where no subdomain problems are solved, but a combination of coarse grid correction and smoothing similar to a multigrid V-cycle takes place instead. Degree 1 of the prolongator smoother was sufficient in most computational experiments. All three methods are suitable for unstructured meshes and can be implemented fully black-box. If additional information is available the methods use it to improve their convergence properties. This additional input usually comes in the form of the basis of the nullspace of the unconstrained matrix of the problem (rigid body modes for elasticity). The methods have been applied to both artificial and real-world prob lems to test their robustness with respect to aspects such as the discontinuities of coefficients and unstructured meshes. \Ve have established that these methods, when used as preconditioners provide a significant improvement in conjugate gradient performance. At the same time, the asymptotic computational com plexity estimates obtained suggest that using ACDD, BOSS or lVILS to solve 3D problems discretized on a mesh with large fractal dimension requires even for serial implementations, less work than just the back-substitution stage of a direct solver. Numerical experiments show that the methods are applicable to a broader range of problems than considered in this thesis. This suggests the di rection of future research: The application of BDD to solving the plate and shell problems was studied in [56], which also applies to ACDD, but similar analysis remains to be carried out for both BOSS and lVILS. On the practical front, in order to extend applicability of ACDD(k), the implementation of more efficient multigrid local approximate solvers, now 154

PAGE 169

in progress, has to be finished. These solvers are fully algebraic to preserve the black-box philosophy of ACDD. The original implementation of ACDD was written for solving scalar problems only and although some of the libraries have already been rewritten to support nonscalar problems the rest of the project will have to conform before computational results measuring the performance for nonscalar problems are available. 155

PAGE 170

A. Appendix Theoretical Results In this chapter we state and and prove some results used in the text. These results are well-known to readers acquainted with domain decomposition methods. A.l Poincare-Friedrichs Inequality The following is a version of the well-known Po in Friedrichs in equality. \Ve include it here for the sake of completeness and because it is formulated more generally then usual and the traditional versions ( see, e.g., [72, Theorem 7.1},[76]) easily follow from here. Lemma A.l.l. Let pm denot e the space of polynomials of degree at most rn and let B be a Banach space, rn > 0, and K: Hm{rt)-:rB be a bounded linear operator such that the following implication holds Ku = 0 and u E pm-I ==? u = 0. (A.1) Further let n be a domain such that diam (D.) = 1. Then there exists a constant C{D.) > 0 such that Since the converse inequality holds trivially, we obtain the following Corollary A.1.2. Under the assumptions of Lemma A.l.1 it holds that I I 112 ,..., IK 12 + 11 (m) 112 U um( n ) ,..., U U U(n) 156

PAGE 171

Proof. As the statement is obviously valid for u = 0, we assume u ::/-0 and proceed by contradiction. Assume that (A.2) does not hold. Then there exists a sequence {ui} in Hm (D) such that The compact imbedding of H1 (D) into L2(D), known as Rellich theorem [22] implies that also Hk(D) is compactly imbedded in Hk-I (D), k = 1, ... m, so there exists a subsequence {uiJ of {ui } convergent in Hk-1(D), and in L2(D). Therefore ( recalling that {ui) is a Cauchy sequence in Hm(D) hence it converges to an element u E Hm(D). Thus k = 1 ... m, i.e. all generaliz e d derivatives of ui. of degree up to m converge in J L2 to the respective generalized derivatives of u, and since llu(n)--+0, we conclude that l lu(m) I ILZ(n) = 0, and u E pm-I. Due to the Hm-convergence and continuity of K, we have Asumption (A.1) implies that u = 0 which is the sought contradiction. D Remark A.1.3. The advantage of proof we present here is its generality. Constructive proofs found in the literature rely on specific geometries. Let us note that the constant C(D) in Lemma A.l.1 depends only on the shape of the domain. The following theorem is easily obtained from Lemma A.1.1 by scaling the L2 and H1 norms form a domain of diameter 1 to one of diameter H. Theorem A.1.4. Let pm denote the space of polynomials of degree at most rn and let B be a Banach space, rn > 0, and K : Hm(D)--+B be a bounded 157

PAGE 172

linear operator such that the following implication holds Ka = 0 and u E pm-I ===? u = 0. (A.3) Further let n be a domain such that diam (n) = H. Then there exists a constant C(n) > 0 such that Following lemma is a variation of standard inequality on the space of traces. Lemma A.1.5. Let n be a domain such that diam (n) = 1. Let B be the linear operator on H1 12 (ani) defined by B(v) = vlon,nrn (A.5) if meas(ani n fv) > 0 Le., if there are some Dirichlet boundary conditions imposed on ani, and B(v) = f. vds lou, (A.6) otherwise. Then there exists a constant C (n) such that for all u E KerB, la h/2,<'1!1; ::; llulh;2,<'1l1; ::; C(n) lah / 2 ,0!1; (A.7) Proof. It is sufficient to prove the proposition for the case of an arbi-trary operator B such that for all constant functions, B (u) = 0 implies u = 0. The left-hand inequality in (A. 7) is immediate. Suppose the right-hand inequality in (A.7) is false. Then there exists a sequence {un} in KerB such that (A.8) 158

PAGE 173

which implies that (A.9) Because of the compact imbedding H1i2(DfJ) '---+'---+ L2(DfJ), there exists a sub s e quence {'Unk} of {'u11} such that . 2 Unk ---7 u lil L (an). (A.10) Since u11k is bounded in H1i2(DfJ), there exists (see e.g. [95]) a subsequence, for notational convenience also denoted {u11J such that Since B is a bounded operator and unk E Ker B, From (A.8), II'Unkllli2,at) ---7 llui!Ii2,at) = 1 I Unk I I 2 'j{\ ---7 I 'U I I 2 ')f) = 0. I ,c I c (A.ll) (A.12) (A.13) It follows from ( A.13) and the definition of the H 1 i2 seminorm that there exists a constant L such that u = L almost ever y where. Since B(u) = 0, the assumption on B yields u = 0 which contradict s to (A.12). 0 Remark A.1.6. As in the case of Theorem A.L4, for general domains the constant C(D) in (A. 7) depends on the properties of the domain and the scaling of appropriate norms reveals that for a domain "blown up to diameter H equation (A. 7) changes to (A.14) 159

PAGE 174

As an application of Theorem A.l.4 and Lemma A.l.5, we have the following lemma. Lemma A.l. 7. Let A denote the n x n stiffness matrix resulting from finite element discretization of an elliptic second order differential equation on a domain of diamete r 1 with a mesh of typical meshsize h, and 8 b e the m x m Schur complement matrix obtained form A by eliminating the interior degrees of freedom. Then 1 cond(A) = 0( 1 2 ) ), 1 cond(8) = 0(). h (A.15) Proof. Let a(a, v) = f(v) be the variational formulation from discretization of which A was obtained. Using the equivalence of Euclidean and continuous L2 norms and the Po in inequality (A.4), we can estimate the Rayleigh quotient of A as Vx E 1IF' (A.16) As the finite e lement matrix A is naturally scaled so that its diagonal elements are of order hd-2 the upper bound of the spectrum of A may be found from Gersgorin theorem [47 ] to be 1 RQ (A) ::; g( A) ::; C h,d-2 From the last estimate and (A.16), we obtain 1 cond(A)::; C-12. ), In order to investigate the conditioning of 8, let us first write its def inition 8 = A11 A12Az/A2 1 where A11,A12,A2 1 and A22 are the blocks of A 160

PAGE 175

corresponding to its decomposition into interior and interface degrees of freedom. As the matrix AI2A2] A2I is positive definite, we obtain (A.17) For the lower estimate of the spectrum of 8 we use the equivalence of Euclidean and continuous L2 norms, the trace version of inequality (A.7) and the well-known equivalence l lxlls;:::; I I IIxl l ul/2(<'.Jn) \Ve obtain R(J(8) = (8x, x) s(IIx, IIx) (x, x) ;:::; hd-liiiixiiJJz(on) (A.18) C2IIII I I2 2 -"P X L2(0n) > Cp w.E ffi1n d-Ill -d-I vX .. h IIx ,_zum) h > (A.19) I From the last inequality and equation ( A.17) it follows that cond ( 8) C h, concluding the proof. D 161

PAGE 176

B. Appendix Results Used In Computational Experiments B.l The Stopping Criterion for PCG Since the conjugate gradient method is an iterative method, a practical criterion is needed to determine when to stop the iteration. \Ve need to be certain that our approximation is close to the true solution x* of the system Ax = b in the sense that s; c for some symmetric positive definite matrix Band c > 0. Perhaps the most commonly used choice is B = ATA, leading to the bound on the residual norm s; c Thus we can check whether the relative residual, measured in Euclidean norm is smaller than the required precision c and if so, the iteration is terminated. The weaknesses of this measure of convergence are notoriously well known. A criterion with a better vision of the true distance of the current approximation from the exact solution is needed. Since the measure of a distance of the last iteration to the true solution is unavailable, we use a test based on the residual and the estimate of the condition number. The stopping criterion used in our numerical experiments was based on the estimate of the relative error measured in the A-energetic norm ( cf. Ashby, lVIanteuffel and Saylor [3)) (B.l) 162

PAGE 177

II II,(-,-) denoting the Euclidean norm and inner product, respectively. The iteration will be terminated when the second inequality in ( B.1) is satisfied. \Ve used an approximation of cond(Jvt-1 A) obtained from the extreme eigenvalues of a tridiagonal Lanczos matrix. This matrix can be constructed from the data obtained during the conjugate gradient iteration. In addition, the two eigenvalues of the Lanczos matrix do not have to be computed in each iteration. \Ve start by taking 1 to be the first approximation cond(Jvt-1 A) and compute a new approximation of the extreme eigenvalues of the Lanczos matrix only when (B.1) is satisfied with the current estimate of cond(Jvt-1A). If (B.1) holds also for the new estimate, we terminate the iteration process. The approximation of cond(Jvt-1 A) was obtained as follows. \Ve are interested in the extreme eigenvalues A corresponding to the generalized eigenvalue problem Ax = AJViX. Alternatively, we are searching for the exterme values of the Rayleigh quotient RQ(u) = (A'U, u) \JViu, u) on the Krylov subspace K = {r,Jvt-1i2A Jvt-1i2r, ... }. In each step of the preconditioned conjugate algorithm 2 we solve the problem and compute (B.2) Recall the orthogonal properties 163

PAGE 178

Let us create a bidiagonal matrix 1 -,(}z, 0, 0, ... 0 0, 1, 0, ... 0 B k -0, 0 ... 0 1 -/h 0, 0, ... 0, 0 1 It is easy to see that (B.2 ) is equivalent to Zk-I = Pp #kPk-I Denoting Zk = [zo, ... Zkd Pk = [PI, ... ,pk], this fact can be written as If we consider Zk E K normalized so that llzkii.A-1 = 1 (this amounts to using a scaling matrix .'iA; = diag((z0.A1z0)-1i2 ... (zk_1JViZk_t)-1i2). \Ve have Let us now consider the Rayleigh quotient for any unit vector (in JVi-norm) '"'I' T T. -R.r)(, .) = () 8kBkPk APkBk8k() U -TC' '7'1' 4AC' Z() cJk;Llk JVlcJk; 'k() \V e observe that in the numerator, P{APk = Dk is the diagonal matrix with entries p{ApJ, and in the denominator 'iK;Z[JVi,'hZk =I. Thus Thus the valu es of the Rayleigh quotient will be determined by the spectrum of the three-diagonal matrix .'iK;B[DkBk'h, a task easily accomplished b y the standard methods (cf. [51]). 164

PAGE 179

References [1] R. ADAMS, Sobolev Spaces, Academic Press, New York, 1975. [2] G. ANAGNOSTOU, Y. MADAY, C. MAVRIPLIS, AND A. T. PATERA, On the mortar element method: generalizations and implementation, in Third International Symposium of Domain Decomposition lVIethods for Partial Differential Equations, T. F. Chan, R. Glowinski, J. and 0. B. \Vidlund, eds. Philadelphia, 1990, SIAl\'!, pp. 157173. [3] S. F. ASHBY, T. A. MANTEUFFEL, AND P. E. SAYLOR, A taxonomy for conjugate gradient methods, SIAM J. Numer. Anal., 27 (1990), pp. 15421568. [ 4 ] C. BERNARDI : Y. MADAY, AND A. PATERA A new non conforming approach domain decomposition: The mortar element method, in CollEge de France Seminar, H. Brezis and .J. Lions eds., Pitman, Providence, RI, 1994. [5] P. E. BJ(?RSTAD AND .J. MANDEL, Spectra of sums of orthogonal projections and applications to parallel computing, BIT, 31 (1991), pp. [ 6 ] R. BLAHETA, A multilevel method with overcorrection by aggregation for solving discrete elliptic problems, .J. Comput. Appl. Math., 24 {1988) pp. 227. [7] --, Iterative Methods for Numerical Solving of the Boundary Value Problems of Elasticity, PhD thesis Hornick}' 1J.stav, Ostrava, 1989. in Czech. [8] C. B()RGERS, The Neumann-Dirichlet domain decomposition method with inexact solvers on the subdomains, Numer. Math., 55 (1989), pp. 123136. 165

PAGE 180

[9] .J. H. BRAMBLE, A second order finite difference analogue of the first biharmonic boundary value, Numer. l\Iath., 9 (1966), pp. 236 249. [ 10 ] .J. H. BRAMBLE, .J. E. PASCIAK, AND A. H. SCHATZ, The construction of preconditioners for elliptic problems by substructuring, I, Math. Comp., 47 (1986), pp. 103. [11] --, An iterative method for elliptic problems on regions partitioned into substructures, Math. Comp., 46 ( 1986), pp. 361369. [12] --,The construction of preconditioners for elliptic problems by substructuring, IV, Math. Comp., 53 (1989), pp. 1 24. [13] .J. H. BRAMBLE .J. E. PASCIAK, AND A. T. VASILLEV, Analysis of non-overlapping domain decomposition algorithms with inexact solvers. To appear, 1996. [14] .J. H. BRAMBLE, .J. E. PASCIAK1 .J. vVANG, AND .J. Xu, Conver gence estimates for multigrid algorithms without regularity assumptions, Math. Comp., 57 (1991) pp. 2345. [ 15 ] A. BRANDT, Algebraic multigrid theory: The symmetric case, Appl. Math. Comput., 19 (1986), pp. 23 56. [16] A. BRANDT, s. F. McCORMICK AND .J. vV. RuGE, Algebraic multigrid (AMG) for sparse matrix equations, in Sparsity and Its Applica tions, D. J. Evans ed., Cambridge Univ. Press, Cambridge, 198 4 [17] M. BREZINA AND P. VANEK, One black-box iterative solver. In prepa ration, 1997. [ 18 ] X.-C. CAI, An optimal two-level overlapping domain decomposition method for elliptic problems in two and three dimensions, SIAM J. Sci. Comp., 14 (1993), pp. 239 247. [19] M. A. CASARIN AND 0. B. vVIDLUND, A hierarchical preconditioner for the mortar finite element method, Tech. Report 712 Courant In stitute of Mathematical Sciences 1996. Tech. R eport. 166

PAGE 181

[20] T. F. CHAN AND B. SMITH, Domain decomposition and multigrid algorithms for elliptic problems on unstructured meshes, Tech. Re port 93-42, Department of lVIath., UCLA 1993. CAlVI Report. [21] T. F. CHAN AND B. SMITH, Domain decomposition for unstructured mesh problems, in Proceedings of the seventh international symposium on domain decomposition methods for partial differential equations D. E. Keys and J. Xu eds., 1993 pp. 175189. [22] P. CIARLET, The Finite-Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978. [ 23 ] P. G. CIARLET, The Finite Element Method for Elliptic Problems, North---Holland, Amsterdam New York 1978. [ 24 ] P. CC)NCUS G. H. GOLUB, AND D.P. O 'LEARY, A generalized conjugate gradient method for the numerical solution of elliptic PDE, in Sparse l\riatrix Computations J. R. Bunch and D. J. Rose eds. Academic Press New York, 1976 pp. 309332. [25] L. COWSAR, .J. MANDEL, AND M. F. vVHEELER, Balancing domain decomposition for mixed finite elements. Math. Comp., To appear. [ 26 ] Y. H. DE RoECK AND P. LE TALLEC, Analysis and test of a local domain decomposition preconditioner, in Fourth International Sympo sium on Domain Decomposition Methods for Partial Differential Equations, R. Glowinski, Y. Kuznetsov, G. Meurant J. and 0. \Vidlund eds. SIAM Philadelphia, PA 1991. [27] M. DRYJA, A finite elementcapacitance matrix method for the elliptic problem, SIAM J. Numer. Anal., 20 ( 1983), pp. 6716 8 0. [ 28 ] M. DRYJA, A method of domain decomposition for 3-D finite element problems, in First International Symposium on Domain Decomposi tion Methods for Partial Differential Equations, R. Glowinski, G. H. Golub, G. A. Meurant, and J. eds., Philadelphia, PA, 1988, SIAM. [29] M. DRYJA, vV. PROSKUROWSKI, AND 0. B. vVIDLUND, A method of domain decomposition with cross points for elliptic finite element problems, in Optimal Algorithm B. Sendov ed. Publishing House of the Bulgarian Academy of Scienc e s Sofia, 1986, pp. 97 111. 167

PAGE 182

[30] M. DRYJA, B. F. SMITH, AND 0. B. vVIDLUND, Schwarz analysis of iterative substructuring algorithms for elliptic problems in three dimensions, SIAlVI J. Numer. Anal., 31 (1994), pp. 1662. [31] M. DRYJA AND 0. B. vVIDLUND, Towards a unified theory of domain decomposition algorithms for elliptic problems, in Third Intenm tional Symposium of Domain Decomposition lVIethods for Partial Differen tial Equations, T. F. Chan, R. Glowinski J. and 0. B. \Vidlund, eds. Philadelphia, 1990, SIAl\'!, pp. 321. [32] M. DRYJA AND 0. B. vVIDLUND, Schwarz methods of NeumannNeumann type for three-dimensional elliptic finite element problems, Tech. Report 626, Department of Computer Science, Courant Institute, March 1993. To appear in Comm. Pure Appl. Math. [33] --, Domain decomposition algorithms with small overlap, SIAM J. Sci.Comput., 15 (1994), pp. 604---620. [34] R. E. EwiNG, R. D. LAZAROV, T. F. RusSELL, AND P. S. VASSILEVSKI, Local refinement via domain decomposition techniques for mixed finite element methods with rectangular Raviar-Thomas elements, in Doamin Decomposition Methods for PDE's, T. F. Chan, R. Glowinski, J. and 0. B. \Vidlund, eds., SIAM, Philadelphia, 1990 pp. 98114. [35] V. FABER AND T. A. MANTEUFFEL, Necessary and sufficient conditions for the existence of a conjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 352. [36] C. FARHAT, P. S. CHEN, AND .J. MANDEL, Scalable Lagrange multiplier based domain decomposition method for time-dependent problems, Int. J. Numer. Meth. Engrg., (1995). To appear. [ 37 ] C. FARHAT AND M. LESOINNE, Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics, J. Numer. Meth. Engrg., 36 (1993), pp. 745764. [ 38 ] --, Mesh partitioning algorithms for the parallel solution of partial differential equations, Appl. Numer. Math., 12 (1993), pp. 443---457. [39] C. FARHAT AND F. X. Roux, A method offinite element tearing and interconnecting and its parallel solution algorithm, Int. J. Numer. Meth. Engng., 32 ( 1991 ). 168

PAGE 183

[40] G. E. FORSYTHE AND E. G. STRAUSS, On the best conditioned matrices, Proc. Amer. lVIath. Soc., 6 (1955), pp. 340345. [41] A. GEORGE, Nested dissection of regular finite element mesh, Siam. J. Numer. Anal. 11 (1973), pp. 345---363. [42] A. GEORGE AND .J. LIU Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall Englewood Cliff's, NJ, 1981. [43] R. GLOWINSKI AND M. F. vVHEELER, Domain decomposition and mixed finite element methods for elliptic problems, in First Intenm tional Symposium on Domain Decomposition lVIethods for Partial Diff'eren tial Equations, R. Glowinski, G. H. Golub, G. A. rvieurant, and J. eds., Philadelphia, 1988, SIAP.rJ, pp. 144 172. [44] G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, Johns Hopkins Univ. Press, Baltimore, MD, 1989. Second edition. [45] vV. HACKBUSCH, Multigrid Methods and Applications, vol. 4 ofCom putationallVIathematics, Springer---Verlag, Berlin, 1985. [ 46 ] M. R. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49 (1952), pp. 409 436. [47] R. A. HORN AND C. R .JOHNSON, Matrix Analysis, Cambridge Uni versity Press, Cambridge, New York, Port Chester, Melbourne, Sydney first ed., 1985. [ 48 ] T .. J. R. HuGHES AND R. M. FERENCZ Fully vectorized EBE preconditioners for nonlinear solid mechanics: applications to large-scale three-dimensional continuum, shell and contact/impact problems, in First International Symposium on Domain Decomposition Methods for Partial Differential Equations, R. Glowinski, G. H. Golub, G. A. Meurant, and J. eds., Philadelphia, 1988 SIAM, pp. 261280. [49] T .. J. R. HuGHES, R. M. FERENCz, AND .J. 0. HALLQUIST, Large-scale vectorized implicit calculations in solid mechanics on a Cray X MP /48 utilizing EBE preconditioned conjugate gradients, Comp. Meth. Appl. Mech. Engng. 61 (1987), pp. 215 248. 169

PAGE 184

[50] T .. J. R. HuGHES, I. LEVIT, AND .J. \VINGET, An element-by-lement solution algorithm for problems of structural and solid mechanics, Comp. lVIeth. Appl. lVIech. Engng., 36 (1983), pp. 241 254. [51] E. ISAACSON AND H. B. KELLER, Analysis of Numerical Methods, John 'Wiley & Sons, New York, 1966. [52] .J. KR.izKOVA AND P. VANEK, Two-level preconditioner with small coarse grid appropriate for unstructured meshes, Numerical Linear Algebra with Applications, 3 (1996), pp. 255---274. [53 ] P. LE TALLEC, Neumann-Neumann domain decomposition algorithms for solving 2D elliptic problems with nonmatching grids, East-\Vest J. Numer. l\'lath., 1 (1993), pp. 129146. [54 ] --. Personal communication, 1994. [55] --, Domain decomposition methods in computational mechanics, Computational Mechanics Advances 1 (1994), pp. 121 ---220. [ 56 ] P. LE TALLEC, .J. MANDEL, AND M. VIDRASCU, Parallel domain decomposition algorithms for solving plate and shell problems, in Advances in Parallel and Vector Processing for Structural Mechanics, Ed inburgh, 1994, CIVIL-COMP Ltd. Proceedings, Athens 1994. [57] P. LE TALLEC AND .J. RODRIGUES, Domain decomposition methods with nonmatching grids applied to fluid dynamics, in Proceedings of the VIII International Conference on Fluid Dynamics, K. Morgan, E. Onate, J. Periaux, J. Peraire, and 0. Zinkiewicz, eds. Pineridge Press Barcelona, 1993, pp. 405---418. [ 58 ] \V. LEONTIEF The Structure ofthe American Economy 1919-1939, Oxford University Press, New York 1951. [59 ] P. LIONS, On the Schwarz alternating method I, in First Intenm tional Symposium on Domain Decomposition Methods for Partial Differen tial Equations, R. Glowinski G. H. Golub, G. A. Meurant, and J. eds., Philadelphia, 1988 SIAM, pp. 1---42. [60] D. LUENBERGER, Linear and Nonlinear Programming, Addison \Ves ley, Menlo Park, London, Amsterdam, Don Mills, Sydney, second eel., 1984. 170

PAGE 185

[61] .J. MANDEL, On block diagonal and Schur complement preconditioning, Numer. l\'lath., 58 (1990), pp. 7993. [ 62 ] --, Balancing domain decomposition, tech. report, Computational Mathematics Group, University of Colorado at Denver, 1992. Communica tions in Applied Numerical Methods, to appear. [63] --,Balancing domain decomposition, Comm. in Numerical Methods in Engrg., 9 (1993), pp. 233 241. [64 ] --, Hybrid domain decomposition with unstructured subdomains, in Domain Decomposition Methods in Science and Engineering: The Sixth International Conference on Domain Decomposition, Y. A. Kuznetsov, J. A. Quarteroni, and 0. B. \Vidlund eds., vol. 157, AMS, 1994. Held in Como, Italy, June 15---19, 1992. [65] .J. MANDEL AND M. BREZINA, Balancing domain decomposition: Theory and computations in two and three dimensions, UCD/CCM Report 2, Center for Computational Mathematics, University of Colorado at Denver, 1993. [ 66 ] --, Balancing domain decomposition for problems with large jumps in coefficients, Math. Comp., 65 (1996), pp. 13871401. [ 67 ] .J. MANDEL, S. F. McCORMICK AND .J. RuGE, An algebraic theory for multigrid methods for variational problems, SIAM J. Numer. Anal., 25 (1988), pp. 91110. [68] .J. MANDEL AND B. SEKERKA, A local convergence proof for the iterative aggregation method, J. Lin. Alg. Applic., 51 (1983), pp. 163--172. [ 69 ] .J. MANDEL AND R. TEZAUR, On the convergence of a substructuring method with Lagrange multipliers, UCD / CCM Report 33, Center for Computational Mathematics, Universit y of Colorado at Denver, December 1994. Submitted to Numerische Mathematik. [70] G. MARCHUK, Methods for Computing Nuclear Reactors, GOSAT OMIZDAT, Moscow, 1961. In Russian. 171

PAGE 186

[71] A. M. MATSOKIN AND S. V. NEPOMNYASCHIKH, A Schwarz alternating method in subspaces, In. Vuzov, 10 (1985) pp. 61---66. Also in Soviet lVIathematics, 10 (1985) pp. 78 84. [72] .J. NECAS, Les Methodes Dirictes en Theorie des Equations Elliptiques, Academia, Prague, 1967. [73] S. V. NEPOMNYASCHIKH, Decomposition and fictitious domains methods for elliptic boundary value problems, in Fifth International Symposium on Domain Decomposition lVIethods for Partial Differ ential Equations, D. E. Keyes, T. F. Chan, G. lVIeurant, J. S. Scroggs, and R. G. Voigt eds., Philadelphia, 1992, SIAlVI, pp. 62 72. [74] .J. S. PRZEMIENIECKI, Matrix structural analysis of substructures, Am. Inst. Aero. Astro. J., 1 (1963), pp. 138---147. [75] .J. K. REID, On the method of conjugate gradients for the solution of large sparse systems of linear equations, in Large Sparse Sets of Linear Equations, J. K. Reid ed., Academic Press, New York, 1972, pp. 231--254. [76] K. REKTORYS, Variational methods in mathematics, science, and engineering, D. Reidel Pub. Co., Dordrecht ; Boston, 1977. translated from the Czech by lVIichael Basch. [ 77 ] .J. vV. RUGE Algebraic multigrid (AMG) for geodetic survey problems, in Prelimary Proc. Internat. lVIultigrid Conference, Fort Collins, CO, 1983 Institute for Computational Studies at Colorado State University. [78] .J. vV. RUGE AND K. STtiBEN, Algebraic multigrid (AMG), in lVIulti grid lVIethods S. F. lVIcCormick ed., vol. 3 of Frontiers in Applied lVIathe matics, SIArvi, Philadelphia, PA, 1987 pp. 73---130. [79] M. SARKIS, Two-level Schwarz methods for nonconforming finite elements and discontinuous coefficients, in Proceedings of the Sixth Copper lVIountain Conference on lVIultigrid lVIethods, Volume 2, N. D. lVIel son, T. A. lVIanteuffel, and S. F. lVIcCormick, eds., no. 3224, Hampton VA, 1993, NASA, pp. 5 4 3---566. [80] H. A. SCHWARZ, Gesammelte Mathematische Abhandlungen, vol. 2, Springer, Berlin, 1890, pp. 133---143. First published in Vier t eljahrsschrift der Naturforschenden Ges e llschaft in Zilrich, volume 15, 1870, pp. 272---286. 172

PAGE 187

[81] B. F. SMITH, A domain decomposition algorithm for elliptic problems in three dimensions, Numer. l\'lath., 60 (1991), pp. 219234. [82 ] K. STi.iBEN, Algebraic multigrid (AMG): experiences and comparisons, Appl. Math. Comput., 13 (1983), pp. 419. [83] R. TEZAUR, P. VANEK, AND M. BREZINA, Two-level method for solids on unstructured meshes. submitted to SIAM J. Sci. Comp. [84] T. E. TEZDUYAR AND .J. LIOU, Element-by-element and implicitexplicit finite element formulations for computational fluid dynamics, in First International Symposium on Domain Decomposition Methods for Partial Differential Equations, R. Glowinski, G. H. Golub G. A. Meurant, and J. P<''riaux, eds., Philadelphia, 1988, SIAM, pp. 281 300. [85 ] T. E. TEZDUYAR, .J. LIOU, T. NGUYEN, AND 8. POOLE, Adaptive implicit-explicit and parallel element-by-element iterative schemes, in Proceedings of the Second International Symposium on Do main Decomposition Methods for Partial Differential Equations, T. F. Chan, R. Glowinski, .J. and 0. B. \Vidlund eds. Philadelphia, 1989, SIAM pp. 443---463. [86 ] P. VANEK, .J. MANDEL, AND M. BREZINA, Algebraic multigrid on unstructured meshes, UCD/CCM Report 34, Center for Computational Mathematics, University of Colorado at Denver, December 1994. Submitted to SISC. [87] P. VANEK, .J. MANDEL, AND M. BREZINA, Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems, Computing, 56 (1996), pp. 179196. [88] P. VANEK, R. TEZAUR, M. BREZINA, AND .J. KR..izKOVA Two-level method with coarse space size independent convergence, in Do main Decomposition Methods in Sciences and Engineering R. Glowinski, J. Z. Shi, and 0. \Vidlund eds., John \Viley & Sons Ltd., New York, N.Y., 1997. [89] .J. vVANG New convergence estimates for multilevel algorithms for finite element equations. Submitted. 173

PAGE 188

[90] .J. vVANG, Convergence analysis without regularity assumptions for multigrid algorithms based on SOR smoothing, SIAlVI J. Numer. Anal., 29 (1992), pp. 9871001. [91] A. vVATHEN. Personal communication, 1992. [92] 0. B. vVIDLUND, An extension theorem for finite element spaces with three applications, in Numerical Techniques in Continuum IVIe chanics, \V. Hackbusch and K. \Vitsch, eds., Braunschweig j \Viesbaden, 1987 Notes on Numerical Fluid l\!Jechanics, v. 16, Friedr. Vieweg und Sohn, pp. 110---122. Proce edings of the Second GAlVIM-Seminar, Kiel January, 1986. [93] 0. B. vVIDLUND, Iterative substructuring methods: algorithms and theory for elliptic problems in the plane, in First International Sympo sium on Domain Decomposition Methods for Partial Differential Equations, R. Glowinski, G. H. Golub G. A. Meurant, and J. eds., Philadel phia, 1988, SIAM, pp. 113128. [94] .J. Xu, Iterative methods by subspace decomposition and subspace correction, SIAM Review, 34 (1992), pp. 581---613. [95] K. Y OS IDA, Functional Analysis, SpringerVerlag, Berlin, Heidel berg, sixth ed., 1980. 174