Past

Optimization by Space Transformation and Decomposition

Abstract

We introduce a space transformation framework for unconstrained optimization of a function f that is smooth but not necessarily convex. At each iteration of the framework, the gradient of f is mapped by a forward transformation to an auxiliary (possibly lifted) space, in which a model is established based on the transformed gradient and a step is obtained by minimizing this model. A backward transformation then maps the step back to the original space, where the iterate is updated accordingly. Using a trust-region globalization scheme, and by inspection of the consistency between the forward and backward transformations, the framework is guaranteed to converge globally to first-order criticality with an O(\epsilon^−2) worst-case iteration complexity to reach a gradient with norm smaller than \epsilon>0. The complexity is improved to O(\epsilon^−1) and O(log(\epsilon^−1)) when f is convex and strongly convex respectively. The space transformation framework can be directly specialized to a parallel space decomposition framework for nonlinear optimization, which can be regarded as an extension of the parallel Schwarz domain decomposition method for PDEs and linear systems. Such a decomposition framework is motivated by problems that cannot be directly solved (or even saved) in the full space and hence divide-and-conquer is needed. As in domain decomposition, introducing overlaps between subspaces can improve the performance of space decomposition methods provided that overlaps are handled properly. Our space decomposition framework covers a wide range of overlap-handling strategies, including Restricted Additive Schwarz, Weighted Restricted Additive Schwarz, and Additive Schwarz with Harmonic Extension, all of which have achieved remarkable success in domain decomposition. Similar to the coarse grid techniques in domain decomposition, we incorporate a coarse space correction into our framework. An unsophisticated implementation of the framework works quite well in our numerical experiments. With Restricted Additive Schwarz and coarse space correction, the algorithm scales nicely when the number of subspaces increases. At the same time, the space transformation framework can be specialized to analyze trust-region methods when the gradient of f is evaluated inaccurately. Applying the theory of the space transformation framework, we provide sharp bounds on the gradient inaccuracy that guarantee global convergence of trust-region methods, and prove new global convergence results while recovering some existing ones. More importantly, we establish a complete theory of worst-case complexities for trust-region methods based on inaccurate gradients. If the gradient inaccuracy respects the aforementioned bounds, these complexities are of the same order as when the gradients are accurate, and the gradient inaccuracy only enlarges the multiplicative constants in them by a mild magnitude that is entirely decided by trust-region algorithmic parameters and independent of the optimization problem. On the other hand, once the gradient inaccuracy exceeds such bounds, the behavior of trust-region methods deteriorates dramatically, which is demonstrated by our numerical experiment. Our theory provides theoretical insights into the tight connection between algorithmic parameters and the amount of relative gradient error that trust-region methods can tolerate. We show that such a connection is clearly observable in computation. This enables us to propose parameters that are well adapted to a given accuracy level of the gradient. This talk is based on a joint project with Serge Gratton (INPT - ENSEEIHT, France) and Luis Nunes Vicente (Lehigh University, US).