Citation |

- Permanent Link:
- http://digital.auraria.edu/AA00004113/00001
## Material Information- Title:
- Using homotopy methods to solve nonsmooth equations
- Creator:
- Speight, Adam
- Publication Date:
- 1999
- Language:
- English
- Physical Description:
- ix, 66 leaves : ; 28 cm
## Subjects- Subjects / Keywords:
- Nonsmooth optimization ( lcsh )
Programming (Mathematics) ( lcsh ) Homotopy theory ( lcsh ) Homotopy theory ( fast ) Nonsmooth optimization ( fast ) Programming (Mathematics) ( fast ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Bibliography:
- Includes bibliographical references (leaves 64-66).
- General Note:
- Department of Mathematical and Statistical Sciences
- Statement of Responsibility:
- by Adam Speight.
## Record Information- Source Institution:
- |University of Colorado Denver
- Holding Location:
- |Auraria Library
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 44093351 ( OCLC )
ocm44093351 - Classification:
- LD1190.L622 1999m .S65 ( lcc )
## Auraria Membership |

Full Text |

USING HOMOTOPY METHODS TO SOLVE
NONSMOOTH EQUATIONS by Adam Speight B.S., University of Colorado at Denver, 1997 A thesis submitted to the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Applied Mathematics 1999 This thesis for the Master of Science degree by Adam Speight has been approved by Haiwey J. Greenbergs Gary A. Kochenberger Speight, Adam (M.S., Applied Mathematics) Using Homotopy Methods to Solve Nonsmooth Equations Thesis directed by Professor Stephen C. Billups ABSTRACT This thesis presents a new method for solving nonsmooth systems of equations, which is based on probability one homotopy techniques for solving smooth equations. The method is based on a new class of homotopy mappings on locally Lipschitz continuous operators that employ smoothing techniques to ensure the homotopy map is smooth on its natural domain. The new homotopy method is embedded in a hybrid algorithm. This is done so it may exploit the strong global convergence properties of homotopy methods, while relying on a Newton method for local converge to avoid potential numerical problems associated with nonsmoothness nearby a solution. Several theoretical results regarding the convergence behavior of this class of homotopy methods are presented. The hybrid algorithm is applied to solve problems from a class of nonsmooth equations arising from a particular reformulation of the Mixed Complementarity Problem. This paper presents computational results from experimentation with the algorithm as well as issues related to the algorithms implementation. m In addition to presenting the hybrid algorithm, this thesis develops a suite of software routines in the MATLAB environment that allows developers to quickly create and evaluate homotopy-based solvers for nonsmooth equa- tions. This software provides routines to help reformulate Mixed Complemen- tarity Problems into nonsmooth equations, and an interface by which users may solve complementarity problems specified in the GAMS modeling language. It also includes interfaces to several sophisticated routines from HOMPACK90, a well-established collection of FORTRAN90 codes for implementing homotopy methods. This abstract accurately represents the content of the candidates thesis. I recommend its publication. Signed Stephen C. Billups IV ACKNOWLEDGMENTS My most sincere thanks go to my advisor, Stephen Billups. For sev- eral years, he has been a constant source of advice, guidance, friendship, and inspiration, without which this thesis would never have been produced. I would also like to offer my heartfelt thanks to Gary Kochenberger for serving on my thesis committee and Harvey Greenberg for the same and for providing many valuable comments, which greatly improved the quality of this thesis. Thanks also goes to Layne Watson for his valuable suggestions. During the course of my education, several teachers have infused me with a drive to become more than I am and helped me acquire the means to do so. To these individuals, I am eternally grateful. In particular, would like to thank James Smith for helping me to discover a larger world, Allen Holder for acquainting me with it, and David Fischer, Burton Simon, and Stephen Medema for giving me the grand tour. Thanks go to the faculty, staff, and students of the mathematics de- partment at the University of Colorado at Denver who have provided me a supportive and pleasant environment from which to develop. Thanks also go to the Viotti family for helping me to keep perspective on what is really im- portant in life. Finally, I wish to thank my Family for years of support and prayers. Without them I surely would not enjoy the richness of life that I do today. CONTENTS Figures .......................................................... viii Tables.............................................................. ix 1. Introduction...................................................... 1 2. Background Material............................................... 4 2.1 Notation........................................................ 4 2.2 Applications of Nonsmooth Equations ............................ 5 2.2.1 Inequality Feasibility Problem................................ 6 2.2.2 Nonlinear Complementarity Problem............................. 6 2.2.3 Variational Inequalities...................................... 7 2.3 The Mixed Complementarity Problem ............................. 8 2.3.1 Walrasian Equilibrium......................................... 8 2.4 MCP Reformulation.............................................. 10 2.5 Newtons Method................................................ 12 2.5.1 Nonsmooth Newton Methods..................................... 15 2.6 Homotopy Methods............................................... 17 2.6.1 Theory and Definitions....................................... 18 2.6!2 Homotopy-based Algorithms.................................... 22 2.7 Dealing with nonsmoothness..................................... 28 2.7.1 Complementarity Smoothers.................................... 29 3. The Algorithm ................................................... 33 vi 3.1 The Algorithm................................................. 34 3.2 Properties of ................................................ 35 3.3 A Hybrid Algorithm............................................ 38 3.4 Solving Mixed Complementarity Problems........................ 40 4. Implementation.................................................. 42 4.1 HOMTOOLS...................................................... 42 4.1.1 The Root Finding Package................................... 44 4.1.2 The Utility Package....................................... 49 4.1.3 The Homotopy Smooth Package................................ 50 4.1.4 The MCP Smooth Package..................................... 51 4.1.5 The GAMS MCP Package....................................... 53 4.1.6 The Algorithms Package..................................... 54 4.2 Solver Implementation......................................... 55 5. Results and Conclusions......................................... 57 5.1 Testing Procedure ............................................ 57 5.2 Computational Results......................................... 61 5.3 Conclusions................................................... 61 5.4 Avenues for Further Research.................................. 62 References......................................................... 64 vii FIGURES Figure 2.1 A nonsmooth Newtons method with line search.............. 14 2.2 A smoother for max(a;, 0)................................. 30 3.1 A hybrid algorithm........................................ 39 3.2 Procedure to evaluate an element of dsH{x)................ 41 viii TABLES Table 4.1 Parameters for a HOMTOOLS compliant solver.................... 45 4.2 Overview of the HOMTOOLS public interface...................... 46 5.1 MCPLIB Test Problems.......................................... 58 5.2 MCPLIB Test Problems (continued) ........................... 59 5.3 MCPLIB Test Problems (continued) ............................ 60 IX 1. Introduction The problem of solving systems of equations is central to the field of applied mathematics. Many robust and efficient techniques exist to solve sys- tems that are linear or differentiable. In recent years, however, there has been an increasing need for techniques to solve systems of equations that are nons- mooth, i.e., not continuously differentiable. Much progress has been made in generalizing Newtons method to solve such systems [16]. Techniques developed in this vein are attractive because they maintain the fast local convergence properties that are characteristic of Newton-based methods for smooth equa- tions and perform with great efficacy when applied to nonsmooth problems that are nearly linear in the sense that their merit functions do not contain local minima that are not solutions. For problems that are highly nonlinear, however, these Newton-based schemes tend to fail or require a carefully chosen starting point to produce a solution. Finding such a point may require a great deal of human intervention. To solve highly nonlinear systems of equations that are smooth, an- other class of algorithms known as homotopy methods may be employed, which often prove more robust than their Newton-based counterparts. To date very little progress has been made in extending homotopy methods into the domain of solving nonsmooth systems of equations. A principle objective of this thesis is to develop a class of techniques that extend conventional homotopy methods so they can be made to solve nonsmooth systems. A secondary objective of this thesis is to develop a set of software tools that allows for the rapid prototyping, 1 development, and testing of homotopy-based solvers for nonsmooth equations. Chapter 2 presents some applications of nonsmooth systems of equa- tions and describes a Newton-based method to solve them. It then presents some background material that will be necessary for the remainder of the the- sis, including some theory on probability one homotopy methods for smooth equations, algorithms for implementing the same, and some theory about smoothing techniques for nonsmooth functions. Chapter 3 describes our approach for using homotopy methods for solving nonsmooth equations. It presents some theoretical results, stating suf- ficient conditions under which the method should find a solution and discusses how the method can be incorporated into a hybrid algorithm, which makes use of a Newton-based method to achieve fast local convergence. Chapter 4 details a suite of MATLAB software called HOMTOOLS. This package provides a uniform framework by which algorithm developers may write homotopy-based solvers for nonsmooth systems in MATLAB and other environments and test them against sets of problems coming from a va- riety of sources, including GAMS. It also makes several of the routines from the HOMPACK90 suite of codes available in the MATLAB environment. This package provides several sophisticated routines pertinent to homotopy-based algorithms that allow developers to leverage highly sophisticated routines and quickly write high-level algorithms. Chapter 4 also discusses how the homo- topy method presented in Chapter 3 was implemented and integrated with HOMTOOLS. Chapter 5 presents some computational results obtained from exper- imenting with the algorithm described in Chapters 3 and 4. It attempts to discern why the algorithm failed when it did and characterize the algorithms 2 potential for usefulness in the future. This chapter concludes with a discussion on some future avenues for research. 3 2. Background Material 2.1 Notation When discussing matrices, vectors and vector-valued functions, sub- scripts are used to indicate components, whereas superscripts are used to indi- cate the iteration number or some other label. For example A{., A.j, Aij refer to the ith row, jth column, and (, j)th entry of a matrix A, respectively, whereas xk typically represents the kth iterate generated by an algorithm. In contrast to the above, for scalars or scalar-valued functions, we use subscripts to refer to labels so that superscripts can be reserved for exponentiation. The vector of all ones is represented by e. Unless otherwise specified, ||-|| denotes the Euclidean norm. We use the notation ()+, (), and | | to represent the plus, minus, and absolute value operators, respectively, for vectors. That is, x+ := (max(xi, 0);...; max(xn, 0)), x~ := (max(-a:i, 0);...; max(-a:n, 0)), and \x\ := (|zi|;...; |xn|). The symbols and 5R++ refer to the nonnegative real numbers and the positive real numbers respectively. The extended real numbers are denoted by 3? := 3? 1J{oo, +oo}. The vectors l and u E 5R71, specify a set of lower and upper bounds. Throughout this thesis we assume that l < u. The symbol BitU represents the box defined by [l,u\ := {x \ l < x
Real-valued functions are denoted with lower-case letters like / or (j)whereas vector-valued functions are represented by upper-case letters like F or $. For a function F : C C 3?n > 3?m, we define 'ViFj(x) := dFj(x)/dxi. WF(x) is the n x m matrix whose ijth element is ViFj(x). Thus, if / is a 4 scalar valued function, then Vf(x) is a row vector. For a set C C 5R71 we denote its closure and boundary by C and dC respectively. The projection operator for the set C is denoted by 7Tc() That is, ttc{x) represents the projection (with respect to the Euclidean norm) of x onto the set C. We will use the notation o(-) as follows: given a function H : 5Rn -> 5Rm, o(H(x)) to represents any function G : - 9^ satisfying l|C(s)H - ||x}Po||if(x)|| U A sequence is said to converge quadratically if r*&+l and superlinearly if lim sup k>oo lim sup k-t-oo X X' < OO \XK X xk+1 X* \xk a;* I = 0. We say that two sets, A and B, are diffeomorphic if there is a con- tinuously differentiable bijection f3 : A-+ B such that /3_1 is also continuously differentiable. The term almost everywhere will be used to modify to proper- ties indicating that they hold on all elements of a set except on a subset having zero Lesbegue measure. 2.2 Applications of Nonsmooth Equations The need to solve nonlinear equations arises frequently in the field of mathematical and equilibrium programming as well elsewhere in applied math- ematics and in several engineering disciplines. In [16] a number of applications for nonsmooth equations are presented; some of which are described in this section. 5 2.2.1 Inequality Feasibility Problem Finding solutions to sys- tems of inequalities is an important and often difficult task in applied math- ematics. Suppose F : 5R71 > 5ft71 is a locally Lipschitzian function and if is a polyhedral region in Kn. The inequality feasibility problem is to find an x 6 K such that F(x) > 0. (2.1) We can reformulate this into a system of equations by letting H{x) = min(0, F(x)), where min(-, ...,) is the componentwise minimum function of a finite number of vectors. It is clear that H is a nonsmooth operator and that x* solves (2.1) if, and only if, H(x*) = 0. 2.2.2 Nonlinear Complementarity Problem The nonlinear complementarity problem (NCP) is a tool commonly used to model equilibrium problems arising in economics and game theory. It also serves as a generalized framework for many of the problems faced in mathematical programming. Giv- en a continuously differentiable function F : > 3Â£n, the problem NCP(F) is to find some x G so that 0 < x J_ F(x) > 0 (2.2) where r 1 F(i) means that xTF(x) = 0. We may take several approaches to reformulate the problem NCP(F) into a nonsmooth system of equations. Define the functions Hm(x) min(x, F(x)), (2.3) Hn(x) = F(x+) x~, and (2.4) 6 HFB(x) = (xi + Fi(x) yjx\ + Fj(a;)2^ (2.5) where the xf := max(a:i, 0) and xj := min(x;, 0) One may easily verify that the following are equivalent: (1) x* solves (2.2), (2) Hm{x*) = 0, (3) Hn(z) = 0 where x* = z+, and (4) Hfb(x*) = 0. 2.2.3 Variational Inequalities Finite dimensional variational inequalities provide a framework from which to generalize the nonlinear com- plementarity problem as well as other problems in mathematical programming. A comprehensive treatment of the variational inequality is available in [12]. Suppose K is some closed convex subset of 1ft71 and F : D -> 5ftn is a differen- tiable function over some open domain D C 5ft71 containing K. The problem VI (if, F) is to find some x* e so that (y x*)TF(x*) > 0 for every y G K. (2.6) To reformulate VI(F, K) into a system of equations we may define the following functions in terms of the Euclidean projection operator 7r, which induces the nonsmoothness in the reformulations: Hm{ x) x ttk{x F(x)), and (2.7) HN(x) = F(7rK(x)) + (x-7rK(x)). (2.8) The notation for these functions is identical to those defined in (2.3) and (2.4) because if K = 5ft" the function Hm in (2.3) is equivalent to the function Hm in (2.7) and the function HN in (2.4) is equivalent to HN in (2.8). The function HN is sometimes referred to as the normal map. That solving these equations is equivalent to solving VI(F, K) is shown in [12, Chapter 4]. 7 2.3 The Mixed Complementarity Problem One prominent class of problems we are interested in solving is the Mixed Complementarity Problem (MCP). Some prefer to refer to the MCP as the box-constrained variational inequality. This class of problems generalizes the NCP (Section 2.2.2) in that any component of the solution may be bounded above, bounded below, both, or unbounded altogether. We can reformulate the MCP into a system of nonsmooth equations in a fashion similar to the NCP reformulations. This thesis will pay particular attention to applying homotopy- based algorithms to a particular reformulation of this problem. Given a rectangular region BiiU := n"=i[Â£i>] in (5RU {oo})n defined by two vectors, l and u in where oo < l < u < oo, and a function F : BiiU 5Rn, the Mixed Complementarity Problem MCP(i?, Bi>u) is to find an x G BijU such that for each i {1,..., n}, either (1) X{ = k, and F{(x) > 0, or (2) Fi(x) = 0, or (3) Xi = U{, and Ft(x) < 0. This is equivalent to the condition that the componentwise median function, mid(a: l,x u,F(x)) = 0. Sometimes when these conditions are satisfied we write F(x) 1.x and say that x is complementary to F(x). Typically, one assumes that the function F above is continuously differentiable on some open set containing BitU. 2.3.1 Walrasian Equilibrium MCPs are useful for modeling systems that have many agents, each of whom is optimizing a function. These functions may be interdependent in that the choice variables of all the agents may be parameters in the objective function of each agent. These interde- pendent systems of nonlinear programs may be reformulated into an MCP by 8 viewing the combined Karush-Kuhn-Tucker conditions of each agents problem as one large system of inequalities and noticing that together these systems form one large MCP. This reformulation is useful for recasting and solving many models arising in game theory and economics. Of particular interest to microeconomists is the topic of general eco- nomic equilibrium. General equilibrium models usually incorporate many con- sumers, each of whom are maximizing their own utility and a production mech- anism usually consisting of firms who are profit maximizers. A Walrasian equilibrium is characterized by having a set of prices, production activities, and consumption activities such that no good is in positive excess demand [19, Chapter 17.2]. Suppose there are m consumers in an economy and they may choose to purchase from a set of n goods, which can be purchased in arbitrarily fine quantities. If each consumer has well-behaved preferences then m quasi-convex utility functions (u; : > 5ft, i = 1... m) exist describing the consumers preferences[19]. If each consumer has an initial endowment of goods El 6 3?", i 1... m, and p E Kn is a given vector of prices, then the ith consumer faces the following problem: maximize^ Ui(x) subject to p x < p El. Solving each of these m nonlinear programs parametrically in terms of the price vector p gives rise to demand operators for each consumer, dl(p), where dlj(p) is the amount of good j that consumer i will consume at price level p [19, Chapter 7]. For the production side of the model, suppose there are k production sectors. Each sector j has an associated production level yj that produces some 2.9 9 amount (possibly negative) of each good. It is usual to assume that production is linear; that is the set of producible goods is a convex cone in [15]. If vectors of output y and z can be produced, then so can any combination ay+fiz where a. and /? are nonnegative scalars. Because there are a finite number of production sectors, the production cone is also an unbounded polyhedron, so we may represent the production set by a technology matrix A G Rnxk. The element Aij represents the amount of good i resulting in a unit of activity in sector j. A set of equilibrium conditions adapted from [18] follows. No activity earns a positive profit: pTA < 0 (1) No good is in excess demand: (2) Positive prices and production: p > o, y > o (3) No losses or positive profits: (pTA) y = 0 (4) Goods in excess supply are free and all goods with positive price are sold: PT(T,ZiEi + Ay-Z?=id(p))=0 (5) The equations (4) and (5) indicate that the inequality (1) is complementary to y and the inequality (2) is complementary to p. The above system can be viewed as a Nonlinear Complementarity Problem because each component of y and p is bounded by 0 and infinity. 2.4 MCP Reformulation This section describes a class of operators such that any root of a particular operator is a solution to an associated MCP and any solution to that MCP is a root of its operator. Classes of both smooth and nonsmooth operators exist with this property, each with its own advantages. This section 10 focuses on a class of nonsmooth reformulations of a generic MCP. Definition 2.4.1 A function 0 if and only if min(a, b) = 0. These functions are so named because they are useful in reformulating NCPs (see Section 2.2.2). Equation (2.3) describes one way to reformulate an NCP using the component-wise minimum function, each component of which is trivially an NCP function. Indeed, this reformulation would still hold if the H defined in (2.3) were instead defined by Hi(x) = fi(x)) . where (p is any NCP function. Definition 2.4.2 A function ip : 5?U {oo} x 3?U {oo} x 9?2 > is called an MCP function associated with l and u if ipi)U(x,f) := ip(l,u,x,f) = 0 if and only if mid(a; /, x u, f) = 0. MCP functions are generalized NCP functions. If the lower bound l is zero and upper bound u is infinite, then the MCP function ipi,u(x,f) is also an NCP function because 0 = mid(a; l,x u, /) = mid(:r, -oo, /) = min(:c, /) if, and only if, ipi,u{xj) = 0. One very popular NCP function is the Fischer-Burmeister function [9, 10], A simple check reveals that this is an NCP function. While (ppB is not dif- ferentiable at the origin, tion well suited for use in globalization strategies for Newton-based methods. 11 The function (f)FB is also semismooth (see Definition 2.5.3). This fact will be useful when we solve systems of semismooth equations incorporating the Fischer-Burmeister function. Using either the pairwise minimum function or the Fischer-Burmeister function as (j), Billups [1] was able to construct the MCP function: ip(l, u, x, f) := Constructing a function H : from a box BiiU defined by vectors l and u and a C2 operator F, Hi{x) := ip(lu m, xit Fi(x)) (2.12) yields a reformulation of the MCP (F,13ltU). H(x) = 0 if and only if a; is a solution to MCP(B;,U,F)[1]. Solving the nonsmooth equation H{x) = 0 will be the focus of much attention in later chapters. 2.5 Newtons Method Newtons method is a standard technique for solving the equation F(x) = 0. (2.13) If F : > 5Rn is a continuously differentiable, or smooth, operator and a;0 is a starting point in 3?n, the sequence of iterates a;^1 := xk ^JF{xk)}~1F{xk), * = 0,1,... (2.14) is well-defined and converges quadratically under conditions summarized in the following theorem. Theorem 2.5.1 [7]. Let F : 9?71 > be continuously differentiable in an open convex set D C 5Jn. Assume there is some x* G 5Rn and r,{3 > 0, so 12 that Br(z*) C D,F(x*) = 0, VF(x*)_1 exists with || VJP(a:*)11| < /? and VF is Lipschitz continuous with constant 7 on Br(z*)' Then, there exists some e > 0 such that for every x G Br(z*) the sequence generated by (2.14) is well defined, converges to x*, and obeys x fc+i X < Ip x X k = 0,1, This generic version of Newtons method may perform poorly if VF is nearly singular at a solution or may fail altogether if it is started from a point that is too far from a solution. To improve the domain of convergence of the algorithm, a line search strategy can be added. This is accomplished by defining the iterates as xk+l = xk + Qkdk, where dk solves VF(xk)dk = -F(xk) (2.15) and ah is chosen to ensure sufficient descent of the merit function B(x) .= ||F(i)||2. (2.16) One popular criterion for choosing a* is the Armijo rule; choose some a,cr Â£ (0,1) and let a* = am where m is the smallest integer such that 6{xk + amdk) < 9(xk) 2aam9(xk). This rule is used in the algorithm presented in Figure 2.1. More sophisticated line search techniques are presented in [14], 13 Figure 2.1. A nonsmooth Newtons method with line search Step 1 [Initialization] Choose line search parameters a and a in (0,1), an initial point xq in and a stopping tolerance e. Set the iteration index k = 0. Step 2 [Direction Selection] Pick some Vk dBF(xk). If Vk is singular return a failure message along with the last iterate xk and stop. Otherwise let dk be the unique solution to Vkdk = -F(xk). (2.17) Step 3 [Determine Step Length] Let m* be the smallest nonnega- tive integer m so that 9(xk + amdk) 9{xk) < -2aam6(xk). Set xk+1 xk + OLmk. Step 4 [Check for Termination] If 9{xk) < e then stop, returning the solution xk+1. Otherwise increment k by one and return to Step 2. 14 2.5.1 Nonsmooth Newton Methods Developing versions of Newtons method that require weaker assumptions regarding the differentia- bility of F has been the focus of much research in recent years. Much of this research attempts to assume only the property of semismoothness about F and employs the notion of a generalized Jacobian or subdifferential whenever F is not differentiable. Definition 2.5.2 Let F : lRn -> 5Rm be a locally Lipschitzian function. By Rademachers Theorem, F is differentiable except on a set of Lesbegue measure zero. Let DF be the set on which F is differentiable. Define the B-subdifferental of F by dBF(x) := lV 3{xfc} -) x, xk e DF, V = lim VF(xk) J fcvoo ) The Clarke subdifferential of F is the convex hull of dBF(x) and is denoted as dF(x). It is clear that if F is differentiable at a point x, then either char- acterization of the subdifferential is simply the singleton set containing only VF(x). In this regard, the sub differential generalizes the gradient. Definition 2.5.3 A function F : 3?" > is called semismooth at x if lim {Vh1} V exists for every h in Â§Rn. The apparent awkwardness of the definition of semismoothness is due to the fact that much of nonsmooth analysis requires a condition weaker than dif- ferentiability to be useful and stronger than local Lipschitz continuity to be powerful. Semismoothness is such a property. In [17], Qi presents some alter- nate characterizations of semismoothness. The most intuitive interpretation, 15 however, is that semismoothness is equivalent to the uniform convergence of directional derivatives in all directions. It is also worth noting that the class of semismooth functions is closed under composition. In [17], a version of Newtons method based on the Clarke subdiffer- ential was presented. To find a solution to the equation F(x) = 0, starting from a point x, the iterative map xk+1 := xk [Vk]_1 F(xk), (2.18) where Vk (E dF(xk), may be followed. An algorithm presented in [1, Figure 1] is outlined in Figure 2.1. It presents an algorithm similar to the above Newtons method but adds a line search strategy and uses a merit function 9 (for example 6(x) := \ ||F(a;)||2). It also differs in that Vk is restricted to be in the B-subdifferential, whereas [17] uses the Clarke subdifferential. The convergence analysis for this particular method uses the concepts of semicontinuity and BD-regularity. Definition 2.5.4 Suppose that F : is B-differentiable in a neigh- borhood of x. We say that the directional derivative F'(-; ) is semicontinuous at x if, for every e > 0, there exists a neighborhood N of x such that, for all x -f- h Â£ N, H-F'Or + M) -F'faJOII < elM- We say that F'(-] ) is semicontinuous of degree 2 at x if there exist a constant L and a neighborhood N of x such that, for all x + h N, ll-F^z + h]h) F'{x\ h)]| < L ||/i|[2. The following definition generalizes the notion of nonsingularity. In nonsmooth analysis, regularity assumptions on functions are highly analogous to the nonsingularity of Jacobians in the analysis of smooth operators. 16 Definition 2.5.5 A semismooth operator F is called BD-regular at x if every element of 8bF(x) is nonsingular. The local convergence results for the nonsmooth Newtons method presented in Figure 2.1 are presented in the following theorem. Theorem 2.5.6 [1] Suppose that x* is a solution to F(x) = 0, and that F is semismooth and BD-regular at x*. Then, the iteration method defined by a;A:+1 = xk -f- dk, where dk is given by (2.17) is well defined and converges to x* superlinearly in a neighborhood of x*. In addition, if F(xk) ^ 0 for all k, then ihzt+i) lim !' k-K ||F(a;fc)|| = 0. If, in addition, F is directionally differentiable in a neighborhood of x* and F'(-; ) is semicontinuous of degree 2 at a;*, then the convergence of the itera- tions is quadratic. Newton-based methods perform well on problems whose merit functions do not have non-zero local minima. For problems with higher degrees of nonlinearity, Newton-based methods tend to stall by converging to a local minimum of their merit function, or otherwise fail by diverging altogether. The local convergence theory presented above guarantees that they will produce a solution within an arbitrary tolerance provided that they are started sufficiently near to a solution. For many highly nonlinear problems this is not practical, and a more robust, globally convergent method is desired. 2.6 Homotopy Methods A homotopy is a topological construction representing a function that continuously deforms an element of a function space into another element of that space. Suppose we are trying to solve the smooth equation F(x) = 0 17 where F : 5Rn 5Rn. An example of a simple homotopy on F is p(a,\,x) := AF(x) + (1 A)Ga{x) where A G [0,1] is called the homotopy parameter and a is some fixed vector in 5Rn. Here, Ga is some trivial class of operators on 5Rn. An example may be Ga(x) = x a. It is clear that when A = 1 we have that p{a, l,x) = F(x) and when A = 0, p(a, 0, x) = Ga(x). Much progress has been made in using the theory of homotopy map- pings to construct techniques for finding zeros of smooth functions. This sec- tion describes some theory of probability one homotopy methods and describes a standard algorithm to implement them. The term probability one applies because under appropriate conditions, a path from the starting point, which is uniquely determined by the parameter a, to a solution will fail to exist only on a set of those a having zero Lesbegue measure. 2.6.1 Theory and Definitions Much of the theory from prob- ability one homotopy methods can be derived from generalizations of a result in fixed-point theory called Sards Theorem. This class of theorems uses a notion of a regular value. Definition 2.6.1 Let U C be open and F : 5ftn > be a smooth function. We say that y G is a regular value of F if Range VF(x) = ?Rm, for all x G F-1(y). Theorem 2.6.2 (Parameterized Sards Theorem) [5] Let V C 5Â£9, U C $Rm be open sets and let T : V x U > be of class Cr, where r > max{0,m p}. If 0 G is a regular value of T, then for almost ev- ery a G V, 0 is a regular value of the function Ta defined by Ta(-) := Y(a, ). 18 Corollary 2.6.3 [5] Assume the conditions of the previous theorem. If m = p + 1, then each component of Tjx({0}) is a smooth curve for almost every a Â£ V with respect to g-dimensional Lebesque measure. The following proposition gives conditions under which a well-behaved zero curve will exist. It is similar to results presented in [20] and [5, Theorem 2.4], The path defined in the proposition reaches a zero of F in the sense that it contains a sequence {(Afc,:rfc)} that converges to (1,2;), where x is a zero of F. Proposition 2.6.4 Let F : 3?" > be a Lipschitz continuous function and suppose there is a C2 map p : x [0,1) x 3T &n such that (1) Vp(a,A,x) has rank n on the set p_1({0}), (2) the equation pa(0,x) = 0, where pa(X,x) := p(a,A,x), has a unique solu- tion xa G for every fixed a Â£ 3?m, (3) Va;pa(0,xa) has rank n for every o Â£ 5ftm, (4) p is continuously extendable (in the sense of Buck [3]) to the domain 5Rm x [0,1] x and pa(l,:r) = F(x) for all x Â£ 5Rn and a Â£ $Rm, and (5) 7Q, the connected component of Pa HIT}) containing (0,:ra), is bounded for almost every a Â£ 3?m. Then for almost every a Â£ there is a zero curve 7a of pa, along which Vpa has rank n, emanating from (0,a;a) and reaching a zero x of F at A = 1. Further, does not intersect itself and is disjoint from any other zeros of pa. Also, if 7a reaches a point (1,2;) and F is regular at x, then 7a has finite arc length. 19 Proof: Assumption 1 and the Parameterized Sards Theorem give that 0 is a regular value of pa for almost every a E 9ftm. By Corollary 2.6.3 each component of (pa|(o ({0}) is a smooth curve and, as such, either diffeomorphic to a circle or an interval [5]. Because Vxpa(0, xa) has rank n, the Implicit Function Theorem gives x in terms of A in a neighborhood of (0,a;a). This implies that 7a is not diffeomorphic to a circle. Since pa is of class (72, all limit points of 7a must lie in Po1({0}) (with pas extended domain 3ftn x [0,1] x 3ftm), so the only limit point of ya in {0} x 3ftn is (0,xa). Furthermore, ya fl ((0,1) x 3ft71) is diffeomorphic to an interval and (0, xa) corresponds to one end of it. Since 7 is bounded, there is some bounded closed set Be 3ftn such that 7a C [0,1] x B. By the Implicit Function Theorem, ja cannot terminate (have an end point) in (0,1) x 3ftn, and by compactness 70 has finite arc length in every compact subset of [0,1) x B. (For each point z E 7a, there is a neighborhood of z in which ta has finite arc length by the Implicit Function Theorem and the fact that 0 is a regular value of pa. For z & ja, there is a neighborhood of z disjoint from 7a since 3ftn is normal. These neighborhoods form an open covering of [0,1) x B, for which the finite subcovering of a compact subset [0,1 e] x B yields finite arc length for the portion of ja in [0,1 e] x B.) Therefore must leave every set [0,1 e] x B, e > 0, and have a limit point in {1} x B. By continuity of pa if (l,a:*) is a limit point of 7a, then x* is a zero of F. That 7a does not intersect itself or any other component of Pq1({0}) follows from the Implicit Function Theorem and the full rank of Vpa(X,x) on p1({0}): for each z E 7a there is a neighborhood B(z) such that B(z) Pi 7a is diffeomorphic to an open interval. If (1, x) is a limit point of 70 and F is regular at x, then every el- ement of nxdpa(X,x) is nonsingular. By the Implicit Function Theorem for 20 nonsmooth functions (see [6][Corollary, Page 256]), % can be described as a Lipschitz continuous function of A in a neighborhood of (1,5;), and therefore has finite arc length in this neighborhood. Further, the zero curve has finite arc length outside this neighborhood by the above argument, and hence finite arc length everywhere. A function such as p is called a homotopy mapping for F. The zero curve 7q described above can be characterized as the connected component of the set p1({0}) containing the point (0,:ra). Because the path 7a is a smooth curve, it can be parameterized by its arc length away from (0,2;). This yields a function 7Q(s), the point on of arc length s away from (0, . Given a homotopy mapping p, we may construct a globally convergent algorithm as follows. Choose an arbitrary a G 9ftm. This determines a unique starting point x. With probability one, we may then track the zero curve ja to a point (1,5;), where x is a solution. A simple and particularly useful homotopy mapping is p : x [0,1) x 5Rn -* 9?n given by p(a, A, x) := AF(x) + (1 A)(x a). (2.19) If F is a C2 operator then p satisfies properties (1), (2), (3), and (4) but not necessarily (5) of Proposition 2.6.4. Properties (2), (3), and (4) are satisfied trivially. That p satisfies property (1) can be seen in [21]. The following theorem gives conditions on F under which the fifth condition is satisfied. Theorem 2.6.5 [21] Let F ; 5Rn > be a C2 function such that for some sGSR71 and r > 0, (x x)tF{x) > 0 whenever ||a: Â£|| = r. (2.20) 21 Then F has a zero in a closed ball of radius r about x, and for almost every a in the interior of this ball there is a zero curve 7a of Pa{\x) = AF(x) + (1 A)(x a), along which Vpa(A, x) has full rank, emanating from (0, a) and reaching a zero x of F at A = 1. Further, 7a has finite arc length if VF(5) is nonsingular. The actual statement of the theorem in [21] fixes 5 = 0. However, the proof can be modified trivially to yield the more general statement above. For convenience, this thesis shall refer to (2.20) as the global mono- tonicity property as it is similar to the definition of monotonicity but ignores local behavior. If a C2 operator F possesses this property, these theoretical results have some profound implications. We are guaranteed the existence of a path of finite arc length between almost any starting point and a solution to F(x) = 0. In theory, to find a solution, one must simply follow the path from start to a limit point of 7a, where A = 1. In practice, however, the task of constructing an algorithm that can reliably track all types of paths is very difficult. 2.6.2 Homotopy-based Algorithms Many of the homotopy- based algorithms for solving smooth systems of equations assume the condi- tions about rank, smoothness, and global monotonicity presented in Proposi- tion 2.6.4 and Theorem 2.6.5. These assumptions guarantee the existence of a well-behaved zero curve, allowing developers of algorithms to focus on methods for tracking this curve from beginning to a solution. Many packages exist to solve root finding problems using homotopy- based techniques [23]. This work will make use of the FIXPNF routine from the HOMPACK90 suite of software [22] [23, Section 3], which tracks the zero curve of a homotopy mapping specified by the user. This section describes the 22 FIXPNF algorithm and interface in great deal because the algorithm and its parameters will be referred to later. FIXPNF takes the following input parameters: N the dimension of the problems domain. Y an array of size N+l and contains the point at which to start tracking the curve. Y(l) is assumed to be zero and represents the A component and Y(2:N+1) represents the x components. IFLAG sets the mode of FIXPNF to finding fixed points or roots. It should be -2 for the zero finding problem with a user-specified homo- topy. A the parameter a in Proposition 2.6.4. ARCRE,ARCAE.ANSRE, ANSAE real valued parameters corresponding to the absolute and relative errors in the curve tracking and the answer tolerances. These are discussed in more detail later. SSPAR a vector of eight real numbers. They allow the user to have a high level of control over the stepsize estimation process. They are discussed in detail later. FIXPNF provides the following output variables: Y an array of size N+l. Y(2:N+1) contains the solution. NFE number of function evaluations performed during the routines execution. ARCLEN an approximation of the arc lenth of the zero curve that was traversed. IFLAG parameter indicating the error status of the routine. A value of 1 is normal. The core of the FIXPNF routine uses an algorithm consisting of the 23 following phases: prediction, correction, and stepsize estimation. The curve tracking algorithm generates a sequence of iterates (A*,, xk) along the zero curve 7q beginning with (0,a;0), where x is determined by item (2) of Propo- sition 2.6.4 and converging to a point (1,2;*) where x* solves F(x) = 0. Since 7a can be parameterized by its arc length from its starting point by differen- tiable functions (A(s), z(s)), each iterate (A*,, xk) has an associated s*, such that (A*,, xk) is Sfc units away from (0, x) along ja. For the general iteration in the prediction phase suppose that we have generated the two most recent points P1 = (A(si),o;(s1)) and P2 = (A(s2),x(s2)) along 7a and associated tangent vectors (^j, ($i) and (s2). We have also determined some h > 0 to be used as an optimal step size in arc length to take along from (A(s2), x($2)). Noting that because the function (A(s),:r(s)) is the point on ja that is s units of arclength away from (0, x), it must be that 'd\ dx\ . M'Ts (s) = 1 (2.21) for any s. This can be seen because for t > 0 we have that ||(A(s + t),x{s + t)) (A(s),x(s))||2 = t + o(t) by the definition of 7a (s). Dividing both sides by t and letting t 4-0 gives (2,21). As a predictor for the next point on 7a, FIXPNF uses Z := p(s2 + h), (2.22) where p(s) is the unique Hermite cubic polynomial that interpolates P1 and 24 P2. Formally, pM = Wi), *(*)), = Pw = (Aw,xW), |w = (|,gw and each component of p(s) is a cubic polynomial. The corrector phase is where the normal flow algorithm derives its name. As the vector a varies, the paths 7a change. This generates a family of paths known as the Davidenko flow. The predicted point Z* will lie on a particular zero curve corresponding to some other path. The correction phase converges to a point on 7a along a path, which is normal to the Davidenko flow. The corrector phase of FIXPNF uses a basic Newton-like method to converge from the predicted point to a point, Z, on 7a. The standard Newtons method may not be used because Vp(X,x) is a rectangular matrix. Numerical issues related to efficiently computing these iterates are discussed in [22]. Beginning with Z defined by (2.22) the corrector iteration is Zk+1 := Zk-[Vpa(Zk)f pa(Zk), where J^Vpa(2A:)j^ denotes the Moore-Penrose pseudoinverse of the rectangu- lar matrix Vpa{Zk) [11, Chapter 5]. Because of the computational expense incurred by matrix factorizations in the Newton steps, the corrector phase re- ports failure if it has not converged to within the user specified tolerances in four iterations. The iterate Zk is assumed to have converged if zk-i < ARCAE + ARC RE The stepsize estimation phase of this algorithm is an elaborate pro- cess, which is very customizable via changes to values in the input parameter 25 array SSPAR. It attempts to carefully balance progress along with effort expended in the correction phase. This phase of the algorithm calculates an optimal step size A for the next time through the algorithm. The SSPAR array contains eight floating point numbers, (T, R, D, Amjn, Amax, Rmim Bmax, ?) To find the optimal step size for the next iteration we define a contraction factor a residual factor a distance factor , WZ1 zlll wz'-zr P \\Mzl)\\ , iip(^)ir D=\\zl-zt\\ \\z-z*r The first three components of SSPAR represent ideal values for these quantities. If we have that A is the current step size, the goal is to achieve for some q. Define L R D hi ___ *nj __ rs* ____ ________ L R D hi h = (min{Z/L, R/R, D/D})1/ to be the smallest allowable choice for A*. Now A* is chosen to be A* = min{max{Am;n, A}, Bmaxh, Amax}. It remains to describe how the algorithm begins and ends. In the general prediction step, the algorithm uses a cubic predictor. This is possible because there are two points and corresponding tangent vectors from which 26 to interpolate. So to enter the general prediction phase we must generate two points on 7a and find their associated tangent vectors. The first point is (0, x) corresponding to the arc length s0 = 0. To generate the point (Ai,^1) we make a linear approximation of 70 from (0, x), given an initial step size h, by z0 = A-Â§)(s)+(0l0)- We may then use the correction phase to generate the point (A^x1). If the correction phase fails, the stepsize is reduced by half and the process is repeated. To evaluate (^j, ^7) at an s corresponding to a particular point on 7a, (A,a;), we must solve the system Vpa(X,x)d = 0. If Vpa(A, x) has full rank, this equation determines a one dimensional subspace. However, by (2.21), we may take ||d|| = 1 so that d is only undetermined in its sign. For the initial point (0, rr) we may take the sign of d to be that which makes ^j(0) > 0. For subsequent points we choose the sign of d so that drdlast > 0 where dlast was the direction generated on the previous prediction step. This heuristic prevents the path from bending too quickly. It is well justified because (A(s),x(s)) is a differentiable function, which implies there will always be a stepsize small enough to ensure the acuteness of these two directions. The FIXPNF routine determines it has reached an end game state when it produces a pair of iterates (Ak,xk) and (A^+i, a;*+1) such that 5; 1 < Xk+1- The solution now lies on 7a somewhere in between the last two generated points. The algorithm then builds Hermite cubic predictors interpolating (Xk,xk) and 27 (Afc+1, a;*4"1) as above and attempts to generate progressively smaller brackets of points on 70 containing (1,2;*). The algorithm terminates when the end game either converges or fails. 2.7 Dealing with nonsmoothness Since the functions considered in this paper are not of class C2, we can not apply a homotopy method directly to them. Instead, these functions need to be approximated by a family of smooth functions. Suppose we are interested in solving the system F(x) = 0 where F is a nonsmooth operator. Suppose we are given a family of functions Fp parameterized by a smoothing parameter p, so that lim^o F^ = F in some sense. We would like the solutions to the systems FIJ'(x) = 0 to converge to a solution to F(x) =0 along a smooth trajectory. Under suitable conditions this can be achieved [4]. Definition 2.7.1 Given a nonsmooth function $R, a smoother for (p (1)
(2) ip is twice continuously differentiable with respect to x on the set 5ftp x 5ft++.
Example 2.7.2 As a simple example, consider the following function:
(2.27) |