Optimización - jldelafuenteoconnor.es · h i j d e f g a b c 10 8 7 9 465 12 3 Teorema Condiciones...

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

1/63Universidad Politécnica de Madrid–Escuela Técnica Superior de Ingenieros IndustrialesGrado en Ingeniería en Tecnologías Industriales. Curso 2015-2016-3º

Matemáticas de Especialidad–Ingeniería Eléctrica

OptimizaciónProgramación no lineal sin condiciones

José Luis de la Fuente O’[email protected]@upm.es

Clase_minimi_sincond_2016.pdf

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

2/63La OPTIMIZACIÓN es un lenguaje y forma de expresar entérminos matemáticos un gran número de problemas

complejos de la vida cotidiana y cómo se pueden resolver deforma práctica mediante los algoritmos numéricos adecuadosL’optimisation est une discipline combinant plusieurs domaines de compétences : les mathématiques décisionelles, les

statistiques et l’informatique. Cette méthode scietifique a pour but de maximiser ou de minimiser un objectif. En pratiquel’optimisation est souvent utilisée por augmenter la rentabilité ou diminuer les coûts.

An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, oreffective as possible; specifically: the mathematical procedures (as finding the maximum of a function) involved in this.

La optimización se estudia en dos grandes partes:

� Optimización sin condiciones.

minimizar f W Rn! R

� Optimización con condiciones

minimizarx2Rn

f .x/

sujeta a ci.x/D0; i 2 E ;cj .x/� 0; j 2 I:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

3/63

¿Cómo se lleva a cabo un proyecto de optimización?

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

4/63

Índice� El problema sin condiciones

� Condiciones de mínimo

� Métodos de dirección de descenso

�Método del gradiente o de la máxima pendiente�Método de Newton�Métodos de Newton amortiguado y de Región de Confianza�Método de los gradientes conjugados�Métodos cuasi Newton

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

5/63

El problema sin condiciones

� Dar solución aminimizar f W Rn! R

La función f se supone continua en algún conjunto abierto de Rn y conderivadas parciales continuas hasta segundo orden en ese abierto.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

6/63� Ejemplos Si x� es una constante, la función f .x/ D .x � x�/2 tiene unúnico punto donde alcanza el mínimo: en el valor de x�.

1. INTRODUCTION

In this lecture note we shall discuss numerical methods for the solution ofthe optimization problem. For a real function of several real variables wewant to find an argument vector which corresponds to a minimal functionvalue:

Definition 1.1. The optimization problem:

Findx∗ = argminxf(x) , wheref : IRn 7→ IR .

The functionf is called theobjective functionor cost functionandx∗ is theminimizer.

In some cases we want amaximizerof a function. This is easily determinedif we find a minimizer of the function with opposite sign.

Optimization plays an important role in many branches of science and appli-cations: economics, operations research, network analysis, optimal designof mechanical or electrical systems, to mention but a few.

Example 1.1. In this example we consider functions of one variable. The function

f(x) = (x− x∗)2

has one, unique minimizer,x∗, see Figure 1.1.

Figure 1.1:y = (x− x∗)2.One minimizer.

x

y

x*

1. INTRODUCTION 2

The function f(x) = −2 cos(x − x∗) has infinitely many minimizers:x =x∗ + 2pπ , wherep is an integer; see Figure 1.2.

x

y

Figure 1.2:y = −2 cos(x− x∗). Many minimizers.

The function f(x) = 0.015(x − x∗)2 − 2 cos(x − x∗) has a unique globalminimizer,x∗. Besides that, it also has several socalledlocal minimizers,eachgiving the minimal function value inside a certain region, see Figure 1.3.

x

y

x*

Figure 1.3:y = 0.015(x− x∗)2 − 2 cos(x− x∗).One global minimizer and many local minimizers.

The ideal situation for optimization computations is that the objective func-tion has a unique minimizer. We call this theglobal minimizer.

In some cases the objective function has several (or even infinitely many)minimizers. In such problems it may be sufficient to find one of these mini-mizers.

In many objective functions from applications we have a global minimizerand several local minimizers. It is very difficult to develop methods whichcan find the global minimizer with certainty in this situation. Methods forglobal optimization are outside the scope of this lecture note.

The methods described here can find a local minimizer for the objectivefunction. When a local minimizer has been discovered, we do not knowwhether it is a global minimizer or one of the local minimizers. We can-not even be sure that our optimization method will find the local minimizer

� La función f .x/ D �2 cos.x � x�/ tiene mínimos locales en x D x� C 2a� .

1. INTRODUCTION








f(x) = (x− x∗)2



x

y

x*

1. INTRODUCTION 2


x

y



x

y

x*






� f .x/ D 0;015.x � x�/2 tiene un único mínimo global, x�, y muchos locales.

1. INTRODUCTION








f(x) = (x− x∗)2



x

y

x*

1. INTRODUCTION 2


x

y



x

y

x*






h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

7/63Condiciones de mínimo

� La meta de cualquier método de optimización es encontrar el mínimo global dela función, si existe, o un mínimo local.

Definición Una función f W Rn! R se dice convexa si

f .˛x1 C ˇx2/ � f .x1/C f .x2/

para todo x1;x2 2 Rn y todo ˛; ˇ 2 R, con ˛ C ˇ D 1, ˛ � 0, ˇ � 0.

7.4 Convex and Concave Functions 193

y = f(x)

xconvex

(a)

f

xnonconvex

(c)

f

xconvex

(b)

Fig. 7.3 Convex and nonconvex functions

y

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

8/63Teorema Condiciones de convexidad de primer orden Una función f W Rn ! Rderivable —es decir, su gradiente, rf .x/, existe para todo x 2 Rn— es convexa si paratodo x;y 2Rn se cumple que

f .y/ � f .x/Crf .x/T .y � x/.3.1 Basic properties and examples 69

(x, f (x ))

f (y)

f (x ) +∇ f (x )T (y − x )

Figure 3.2 If f is convex and differentiable, then f(x)+∇f(x)T (y−x) ≤ f(y)for all x, y ∈ dom f .

is given by

IC(x) =

{0 x ∈ C∞ x 6∈ C.

The convex function IC is called the indicator function of the set C.

We can play several notational tricks with the indicator function IC . For examplethe problem of minimizing a function f (defined on all of Rn, say) on the set C is thesame as minimizing the function f + IC over all of Rn. Indeed, the function f + ICis (by our convention) f restricted to the set C.

In a similar way we can extend a concave function by defining it to be −∞outside its domain.

3.1.3 First-order conditions

Suppose f is differentiable (i.e., its gradient ∇f exists at each point in dom f ,which is open). Then f is convex if and only if dom f is convex and

f(y) ≥ f(x) +∇f(x)T (y − x) (3.2)

holds for all x, y ∈ dom f . This inequality is illustrated in figure 3.2.The affine function of y given by f(x)+∇f(x)T (y−x) is, of course, the first-order

Taylor approximation of f near x. The inequality (3.2) states that for a convexfunction, the first-order Taylor approximation is in fact a global underestimator ofthe function. Conversely, if the first-order Taylor approximation of a function isalways a global underestimator of the function, then the function is convex.

The inequality (3.2) shows that from local information about a convex function(i.e., its value and derivative at a point) we can derive global information (i.e., aglobal underestimator of it). This is perhaps the most important property of convexfunctions, and explains some of the remarkable properties of convex functions andconvex optimization problems. As one simple example, the inequality (3.2) showsthat if ∇f(x) = 0, then for all y ∈ dom f , f(y) ≥ f(x), i.e., x is a global minimizerof the function f .

Teorema Condiciones de convexidad de segundo orden Una función f W Rn ! R quetiene derivadas parciales de segundo orden —es decir, existe su matriz Hessiana, r2f .x/,para todo x 2 Rn—, es convexa si para todo x 2 Rn ¤ 0 se cumple que xTr2f .x/x � 0,es decir, la Hessiana es semidefinida positiva.

� Ejemplo la función f W R2 ! R, f .x; y/ D x2=y, y > 0

72 3 Convex functions

xy

f(x

,y)

−2

0

2

0

1

20

1

2

Figure 3.3 Graph of f(x, y) = x2/y.

• Negative entropy. x log x (either on R++, or on R+, defined as 0 for x = 0)is convex.

Convexity or concavity of these examples can be shown by verifying the ba-sic inequality (3.1), or by checking that the second derivative is nonnegative ornonpositive. For example, with f(x) = x log x we have

f ′(x) = log x+ 1, f ′′(x) = 1/x,

so that f ′′(x) > 0 for x > 0. This shows that the negative entropy function is(strictly) convex.

We now give a few interesting examples of functions on Rn.

• Norms. Every norm on Rn is convex.

• Max function. f(x) = max{x1, . . . , xn} is convex on Rn.

• Quadratic-over-linear function. The function f(x, y) = x2/y, with

dom f = R×R++ = {(x, y) ∈ R2 | y > 0},

is convex (figure 3.3).

• Log-sum-exp. The function f(x) = log (ex1 + · · ·+ exn) is convex on Rn.This function can be interpreted as a differentiable (in fact, analytic) approx-imation of the max function, since

max{x1, . . . , xn} ≤ f(x) ≤ max{x1, . . . , xn}+ log n

for all x. (The second inequality is tight when all components of x are equal.)Figure 3.4 shows f for n = 2.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

9/63

Teorema Condiciones necesarias de mínimo local de primer orden Si x� es un mínimolocal de f W Rn! R, se cumple que

rf .x�/ D 0:

� Un punto x en el que rf .x/ D 0 se denomina punto estacionario de f .x/.

Teorema Condiciones suficientes de mínimo local de segundo orden Si x� es un puntoestacionario y r2f .x�/ es definida positiva, x� es un mínimo local.

� IMPORTANTE Cualquier mínimo local de una función convexa es un mínimoglobal.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

10/63� Se pueden dar distintos casos de mínimos (locales o globales).

5 1. INTRODUCTION

Theorem 1.5. Necessary condition for a local minimum.If x∗ is a local minimizer forf : IRn 7→ IR, then

f ′(x∗) = 0 .

The local minimizers are among the points withf ′(x) = 0. They have aspecial name.

Definition 1.6. Stationary point. If f ′(xs) = 0, thenxs is said to beastationary pointfor f .

The stationary points are the local maximizers, the local minimizers and “therest”. To distinguish between them, we need one extra term in the Taylorexpansion. Provided thatf has continuous third derivatives, then

f(x + h) = f(x) + h>f ′(x) + 12h>f ′′(x)h +O(‖h‖3) , (1.7)

where theHessianf ′′(x) of the functionf is a matrix containing the secondpartial derivatives off :

f ′′(x) ≡[

∂2f

∂xi∂xj(x)

]. (1.8)

Note that this is a symmetric matrix. For a stationary point (1.7) takes theform

f(xs + h) = f(xs) + 12h>f ′′(xs)h +O(‖h‖3) . (1.9)

If the second term is positive for allh we say that the matrixf ′′(xs) ispositive definite(cf Appendix A, which also gives tools for checking def-initeness). Further, we can take‖h‖ so small that the remainder term isnegligible, and it follows thatxs is a local minimizer.

Theorem 1.10. Sufficient condition for a local minimum.Assume thatxs is a stationary point and thatf ′′(xs) is positive definite.Thenxs is a local minimizer.

The Taylor expansion (1.7) is also the basis of the proof of the followingcorollary,

1.1. Conditions for a Local Minimizer 6

Corollary 1.11. Assume thatxs is a stationary point and thatf ′′(x) ispositive semidefinite whenx is in a neighbourhood ofxs. Thenxs is alocal minimizer.

The local maximizers and “the rest”, which we callsaddle points,can becharacterized by the following corollary, also derived from (1.7).

Corollary 1.12. Assume thatxs is a stationary point and thatf ′′(xs) 6= 0. Then

1) if f ′′(xs) is positive definite: see Theorem 1.10,

2) if f ′′(xs) is positive semidefinite:xs is a local minimizer or a saddlepoint,

3) if f ′′(xs) is negative definite:xs is a local maximizer,

4) if f ′′(xs) is negative semidefinite:xs is a local maximizer or asaddle point,

5) if f ′′(xs) is neither definite nor semidefinite:xs is a saddle point.

If f ′′(xs) = 0, then we need higher order terms in the Taylor expansion inorder to find the local minimizers among the stationary points.

Example 1.3. We consider functions of two variables. Below we show the variationof the function value near a local minimizer (Figure 1.5a), a local maximizer(Figure 1.5b) and a saddle point (Figure 1.5c). It is a characteristic of a saddlepoint that there exists one line throughxs, with the property that if we followthe variation of thef -value along the line, this “looks like” a local minimum,whereas there exists another line throughxs, “indicating” a local maximizer.

a) minimum b) maximum c) saddle point

Figure 1.5:With a2-dimensionalx we see surfacesz = f(x) near a stationary point.

1. El primero es un mínimo (local o global) y r2f .x�/ es definida positiva.

2. El segundo es un máximo y r2f .x�/ es definida negativa.

3. El tercero, un punto de silla: se da cuando r2f .x�/ es semidefinidapositiva, semidefinida negativa o indefinida.

� Si r2f .x�/ D 0 se necesita más información de derivadas parciales de tercerorden para determinar los mínimos locales.

05

1015

2025

0

10

20

30−10

−5

0

5

10

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

11/63Obtención de la soluciónMétodos de dirección de descenso

� Siguen un procedimiento iterativo que hace descender el valor de la función,f .xkC1/ < f .xk/ ; en sucesivos puntos k de ese proceso mediante el cálculode unas direcciones de descenso en las que moverse con este objetivo.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

13/81

Métodos de dirección de descenso

4 Siguen un procedimiento iterativo que hace descender la función,f .xkC1/ < f .xk/ ; en sucesivos puntos k del proceso medianteel cálculo de una dirección de descenso en la que moverse con esteobjetivo.

x(k)p(k)

αkp(k)

x(k+1)xk

˛pk

pkx +1

4 Se diferencian unos de otros en la forma de calcular p.

k

Esquema algorítmico general:

Dados Un x WD x0 y una tol . Hacer found WD false

while (not found) and (k < kmax)Calcular dirección de descenso p

if (p no existe) or (tol)found WD true

elseCalcular ˛: amplitud del paso en p

x WD x C ˛p

endk WD k C 1

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

12/63Amplitud de paso (linesearch)

� Supongamos de momento ya calculada la dirección de descenso p en un puntodel proceso; un asunto esencial en ese punto es ¿cuánto moverse a lo largo deesa dirección? ¿Qué paso dar?

� Para calcular ese paso, hay que minimizar en ˛ la función f .x C ˛p/, es decir

minimizar˛

'.˛/ D f .x C ˛p/:

� Esta minimización puede hacerse estrictamente, o de una forma aproximada,esperándose en este caso un coste menor en número de operaciones y tiempo.

� Si se opta por la inexacta, o truncada, hay que garantizar con un indicador que

f .x C ˛p/ < f .x/;

es decir que la función decrezca suficientemente a lo largo de p.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

13/63

� Hay que evitar pasos muy largos, como en la parte izquierda de la figura con lafunción x2, donde, las direcciones pk D .�1/kC1 y los pasos ˛k D 2C 3=2kC1producen el efecto indicado, desde x0 D 2.

Computing a Step Length αk

The challenges in finding a good αk are both in avoiding that

the step length is too long,

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

(the objective function f(x) = x2 and the iterates xk+1 = xk +αkpk generatedby the descent directions pk = (−1)k+1 and steps αk = 2+3/2k+1 from x0 = 2)

or too short,

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.5

1

1.5

2

2.5

3

(the objective function f(x) = x2 and the iterates xk+1 = xk +αkpk generated by the descentdirections pk = −1 and steps αk = 1/2k+1 from x0 = 2).

� También muy cortos, como las direcciones pk D �1 en x2 y los pasos˛k D 1=2kC1, partiendo también de x0 D 2, que producen lo que se ve.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

14/63� Del desarrollo de Taylor de la función a minimizar se tiene que

'.˛/ D f .x C ˛p/C ˛pTrf .x C ˛p/CO.˛2/y de él que, en ˛ D 0, ' 0.0/ D pTrf .x/.

� La figura que sigue muestra una posible evolución de '.˛/. La expresión de' 0.0/ es la recta A : tangente a f .x C ˛p/ en ˛ D 0. La recta D es '.0/.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

22/78

108

(a)

α1

f ( )xk

AC

B

( )

α2α

(b)

α1

f ( )xk

C Bf ( )xk+1

α2 αα0α *

θ

αL

Figure 4.14. (a) The Goldstein tests. (b) Goldstein tests satisfied.

estimate α0 can be determined by using interpolation. On the other hand, ifEq. (4.56) is violated, α0 < α1 as depicted in Fig. 4.15b, and since α0 is likelyto be in the range αL < α0 < α∗, α0 can be determined by using extrapolation.

If the value of f(xk+αdk) and its derivative with respect toα are known forα = αL and α = α0, then for α0 > α2 a good estimate for α0 can be deducedby using the interpolation formula

α0 = αL +(α0 − αL)

2f ′L

2[fL − f0 + (α0 − αL)f′L]

(4.57)

f .x/

˛1 ˛2 ˛

f .xC /pD

� El descenso implica simultáneamente que '.˛/ < '.0/ y que '.˛/ > ' 0.0/.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

15/63

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

22/78

108

(a)

α1

f ( )xk

AC

B

( )

α2α

(b)

α1

f ( )xk

C Bf ( )xk+1

α2 αα0α *

θ

αL

Figure 4.14. (a) The Goldstein tests. (b) Goldstein tests satisfied.

estimate α0 can be determined by using interpolation. On the other hand, ifEq. (4.56) is violated, α0 < α1 as depicted in Fig. 4.15b, and since α0 is likelyto be in the range αL < α0 < α∗, α0 can be determined by using extrapolation.

If the value of f(xk+αdk) and its derivative with respect toα are known forα = αL and α = α0, then for α0 > α2 a good estimate for α0 can be deducedby using the interpolation formula

α0 = αL +(α0 − αL)

2f ′L

2[fL − f0 + (α0 − αL)f′L]

(4.57)

f .x/

˛1 ˛2 ˛

f .xC /pD

� La recta B es la ecuación

f .x C ˛p/ D f .x/C %˛rf .x/Tp; 0 � % < 12;

cuya pendiente en la zona sombreada puede variar entre 0 y 12˛rf .x/Tp.

Representa una fracción, %, (de 0 a 12) de la reducción que promete la

aproximación de Taylor de primer orden en x.

� La recta C es f .xC ˛p/ D f .x/C .1� %/˛rf .x/Tp cuya pendiente abarcala zona sombreada desde rf .x/Tp a 1

2˛rf .x/Tp.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

16/63

� Los criterios de Armijo y Goldstein de descenso suficiente dicen que el˛ 2 .0; 1/ que se escoja debe ser tal que, para 0 < % < 1

2< � < 1, por

ejemplo % D 0;0001 y � D 0;9,f .x C ˛p/ � f .x/C %˛rf .x/Tp Armijo

yf .x C ˛p/ � f .x/C �˛rf .x/Tp Goldstein

� Otro criterio es el de Wolfe, con dos condiciones también. La primera es igual ala de Armijo, la segunda

rf .x C ˛p/Tp � �rf .x/Tp:

También denominada de curvatura. Indica que la pendiente de la función debeser menor en el nuevo punto.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

17/63

� El procedimiento numérico inexacto más extendido para calcular la amplitud depaso ˛ se conoce como backtracking .

� Comienza con un paso completo, ˛ D 1, y lo va reduciendo mediante un factorconstante, ˇ � ˛, ˇ 2 .0; 1/, hasta que se cumplan los criterios de Armijo yGoldstein, o uno de los dos: preferentemente el de Armijo.

� Funciona sólo si f .x C ˛p/0˛D0 D rf .x/Tp < 0 (dirección de descenso).

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

18/63

Método de la dirección del gradiente o de la máximapendiente

� Consideremos el desarrollo en serie de Taylor de f .x/ hasta primer orden:

f .x C p/ D f .x/Crf .x/Tp CO �kpk2� :� La dirección p en x es una dirección de descenso si

rf .x/Tp < 0:

� El descenso relativo de la función en p es

f .x/ � f .x C p/

kpk D �rf .x/Tp

kpk D �krf .x/k cos �

donde � es el ángulo que forman p y rf .x/.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

19/63� El descenso cualitativo será máximo cuando � D � : donde cos.�/ D �1:cuando la dirección de descenso es

p D �rf .x/

denominada de máxima pendiente.

3 Line search methods

Iteration: xk+1 = xk + αkpk, where αk is the step length (how far to move along pk), αk > 0; pkis the search direction.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 21

1.5

2

2.5

3

xkpk

Descent direction: pTk∇fk = ‖pk‖ ‖∇fk‖ cos θk < 0 (angle < π2with −∇fk). Guarantees that f

can be reduced along pk (for a sufficiently smooth step):

f(xk + αpk) = f(xk) + αpTk∇fk +O(α2) (Taylor’s th.)< f(xk) for all sufficiently small α > 0

• The steepest descent direction, i.e., the direction along which f decreases most rapidly, is fig. 2.5fig. 2.6pk = −∇fk. Pf.: for any p, α: f(xk + αp) = f(xk) + αpT∇fk + O(α2) so the rate of

change in f along p at xk is pT∇fk (the directional derivative) = ‖p‖ ‖∇fk‖ cos θ. Thenminp p

T∇fk s.t. ‖p‖ = 1 is achieved when cos θ = −1, i.e., p = −∇fk/ ‖∇fk‖.This direction is ⊥ to the contours of f . Pf.: take x+p on the same contour line as x. Then, by Taylor’s th.:

f(x+ p) = f(x) + pT∇f(x) + 1

2pT∇2f(x+ ǫp)p, ǫ ∈ (0, 1) ⇒ cos∠(p,∇f(x)) = −1

2

pT∇2f(x+ ǫp)p

‖p‖ ‖∇f(x)‖ −−−−−→‖p‖→0

0

but ‖p‖ → 0 along the contour line means p/ ‖p‖ is parallel to its tangent at x.

• The Newton direction is pk = −∇2f−1k ∇fk. This corresponds to assuming f is locally

quadratic and jumping directly to its minimum. Pf.: by Taylor’s th.:

f(xk + p) ≈ fk + pT∇fk +1

2pT∇2fkp = mk(p)

which is minimized (take derivatives wrt p) by the Newton direction if ∇2fk is pd. (✐ what

happens if assuming f is locally linear (order 1)?)In a line search the Newton direction has a natural step length of 1.

• For most algorithms, pk = −B−1k ∇fk where Bk is symmetric nonsingular:

6

� Algoritmo de máxima pendiente.


while (not found) and (k < kmax)Calcular dirección de descenso p D �g D �rf .x/if (p no existe) or (tol)

found WD true

elseCalcular amplitud de paso ˛ con backtrackingx WD x C ˛p

endk WD k C 1

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

20/63

� La convergencia es lineal.

� Ejemplo Resolvamos minimizarx2R2

ex1C3x2�0;1Cex1�3x2�0;1C e�x1�0;1.

function [x,f] = Maxima_pendiente_unc(fun,x0)% Método de la máxima pendienterho = 0.1; beta = 0.5; % Parámetros y partidaf1=0; maxit = 100; x = x0;

for i=1:maxit % Proceso iterativo[f,g] = fun(x);if (abs(f-f1) < 1e-10), break, endp = -g; alpha = 1;for k=1:10 % Amplitud de paso con backtracking

xnew = x+alpha*p; fxnew = fun(xnew);if fxnew < f + alpha*rho*g’*p

breakelse alpha = alpha*beta;end

endx = x + alpha*p; f1=f;fprintf(’%4.0f %13.8e %13.8e %13.8e %13.8e\n’,i,x,f,alpha);

endend

function [f g]= objfun_min1(x)A = [1 3; 1 -3; -1 0]; b = -0.1*[1; 1; 1];f = sum(exp(A*x+b)); if nargout<2, return, endg = A’*exp(A*x+b);

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

21/63

� Su ejecución da como resultado:

>> [x f]=Maxima_pendiente_unc(@objfun_min1,[-1;1])1 -1.26517900e+000 -2.50497831e-001 9.16207023e+000 6.25000000e-0022 -6.29000734e-001 6.51924176e-002 3.86828053e+000 2.50000000e-0013 -4.50514899e-001 -7.72284882e-002 2.68052760e+000 2.50000000e-0014 -4.21089848e-001 2.38719166e-002 2.60419641e+000 1.25000000e-0015 -3.97610304e-001 -8.05335008e-003 2.56942254e+000 1.25000000e-0016 -3.65030711e-001 1.39821003e-002 2.56295544e+000 2.50000000e-0017 -3.59263955e-001 -5.78404615e-003 2.56080796e+000 1.25000000e-0018 -3.55227870e-001 2.43800662e-003 2.55966300e+000 1.25000000e-0019 -3.52463501e-001 -1.04150522e-003 2.55939647e+000 1.25000000e-001

10 -3.48696568e-001 1.93956544e-003 2.55931730e+000 2.50000000e-00111 -3.48020112e-001 -8.46703041e-004 2.55929408e+000 1.25000000e-00112 -3.47557873e-001 3.70439501e-004 2.55927350e+000 1.25000000e-00113 -3.47243091e-001 -1.62316050e-004 2.55926873e+000 1.25000000e-00114 -3.47028931e-001 7.11957301e-005 2.55926742e+000 1.25000000e-00115 -3.46737604e-001 -1.33695923e-004 2.55926699e+000 2.50000000e-00116 -3.46685147e-001 5.87394995e-005 2.55926683e+000 1.25000000e-00117 -3.46649462e-001 -2.58117184e-005 2.55926673e+000 1.25000000e-00118 -3.46625190e-001 1.13436901e-005 2.55926671e+000 1.25000000e-00119 -3.46592176e-001 -2.13150939e-005 2.55926670e+000 2.50000000e-00120 -3.46586231e-001 9.36927894e-006 2.55926670e+000 1.25000000e-00121 -3.46582187e-001 -4.11844758e-006 2.55926670e+000 1.25000000e-00122 -3.46579437e-001 1.81036718e-006 2.55926670e+000 1.25000000e-00123 -3.46577566e-001 -7.95799575e-007 2.55926670e+000 1.25000000e-001

x =-0.346577566436640-0.000000795799575

f =2.559266696682093

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

22/63� En las siguientes gráficas se puede ver el comportamiento del proceso iterativo.El cálculo de la amplitud se hace mediante backtracking y, exactamente, por elmétodo de la bisección.

0 5 10 15 20 2510

−15

10−10

10−5

100

105

inexacta: backtracking

line search exacta: bisecc.

k

erro

r

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

23/63� Utilizando la rutina fminunc de Matlab, se tendría lo que sigue.

» x0=[-1;1];» options = optimset(’Display’,’iter’,’GradObj’,’on’,’LargeScale’,’off’);» [x,fval,exitflag,output] = fminunc(@objfun_min1,x0,options)

First-orderIteration Func-count f(x) Step-size optimality

0 1 9.16207 201 2 3.57914 0.0499801 2.52 3 3.31537 1 2.113 4 2.60267 1 0.5484 5 2.56573 1 0.3495 6 2.55954 1 0.06136 7 2.55928 1 0.0117 8 2.55927 1 0.0001448 9 2.55927 1 1.88e-007

Optimization terminated: relative infinity-norm of gradient less than options.TolFun.x =

-0.3466-0.0000

fval =2.5593

exitflag =1

output =iterations: 8funcCount: 9stepsize: 1

firstorderopt: 1.8773e-007algorithm: ’medium-scale: Quasi-Newton line search’

message: [1x85 char]

� La función objetivo y su gradiente están enfunction [f g]= objfun_min1(x)% f(x) = sum(exp(A*x+b))A = [1 3; 1 -3; -1 0]; b = -0.1*[1;1;1];f = sum(exp(A*x+b));g = A’*exp(A*x+b);

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

24/63Método de Newton

� Consideremos el desarrollo de Taylor de f .x/ hasta segundo orden de derivadas:

f .x C p/ D f .x/Crf .x/Tp C 12

pTr2f .x/p CO �kpk3� ;donde g D rf .x/ es el vector gradiente y la matriz

H D r2f .x/ D

2666664@2f .x/@2x1

@2f .x/@x1@x2

� � � @2f .x/@x1@xn

@2f .x/@x2@x1

@2f .x/@2x2

� � � @2f .x/@x2@xn

::::::

: : ::::

@2f .x/@xn@x1

@2f .x/@xn@x2

� � � @2f .x/@2xn

3777775,la Hessiana.

� La idea:

Newton�’s Method

!


!


!


h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

25/63

� La condición necesaria de óptimo de ese desarrollo, rf .x�/ D 0, conduce a laecuación

rf .x/Cr2f .x/p D g CH p D 0.

Sistema lineal cuya solución es la dirección de Newton hacia el óptimo.

� Si la matriz H D r2f .x/ es definida positiva (xTH x > 0 para cualquierx ¤ 0), la dirección de Newton es una dirección de descenso pues

0 < pTH p D �pTg;

cumpliéndose así que pTg < 0, como dedujimos.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

26/63

� Algoritmo de Newton para minimizar f .x/ y script de Matlab:


while (not found) and (k < kmax)Calcular dirección de descenso; resolver H p D �g


elseCalcular la amplitud de paso ˛ con backtrackingx WD xC ˛p

endk WD kC 1

end

function [x,f] = Newton_unc_1(fun,x)% Método de Newtonrho = 0.1; beta = 0.5; % Parámetros de line searchmaxit = 100;

for i=1:maxit % Proceso interativo[f,g,H] = fun(x);p = -H\g;if abs(p’*g) < 1e-8, break, endalpha = 1;for k=1:50 % Cálculo inexacto amplitud de paso

xnew = x+alpha*p;fxnew = fun(xnew);if fxnew < f+alpha*rho*g’*p, breakelse alpha=alpha*beta;end

endx = x + alpha*p;fprintf(’%4.0f %13.8e %13.8e %13.8e %13.8e\n’,i,x,f,alpha);

endend

function [f g H] = objfun_min2(x)A = [1 3; 1 -3; -1 0]; b = -0.1*[1; 1; 1];f = sum(exp(A*x+b)); if nargout<2, return, endg = A’*exp(A*x+b); H = A’*diag(exp(A*x+b))*A;

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

27/63� Para resolver otra vez minimizarx2R2

ex1C3x2�0;1 C ex1�3x2�0;1 C e�x1�0;1.

>> [x f]=Newton_unc(@objfun_min2,[-1;1])1 -5.23625188e-002 3.53998022e-001 9.16207023e+000 1.00000000e+0002 -1.05634526e-001 1.05820897e-001 3.73378771e+000 1.00000000e+0003 -3.18485379e-001 2.52139713e-002 2.71665315e+000 1.00000000e+0004 -3.45138214e-001 7.18724132e-004 2.56404324e+000 1.00000000e+0005 -3.46572427e-001 1.03191597e-006 2.55927231e+000 1.00000000e+000

x =-0.3465724270276440.000001031915967

f =2.559266696666079

La convergencia del método y elerror vs iteración son estos:

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 510

−15

10−10

10−5

100

105

k

erro

r

� Si la Hessiana es definida positiva, la convergencia del método de Newton escuadrática. Funciona especialmente bien en las proximidades del óptimo o unpunto estacionario.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

28/63Combinación de máxima pendiente y Newton

� La dirección de Newton es siempre de descenso si la matriz Hessiana es definidapositiva.

� Un algoritmo híbrido de máxima pendiente y Newton, para puntos de arranquelejanos donde no se de esa condición de la Hessiana, podría mejorar lasprestaciones del método de Newton.

if H .x/ es definida positivap D pN

elsep D pmp

endx WD x C ˛p

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

29/63

� En Matlab:

function [x f i] = Newton_mp(fun,x)% Método híbrido Newton-máxima pendienteglobal hrho = 0.1; beta = 0.5; % Parámetros de line searchmaxit = 100; h=sqrt(eps);for i=1:maxit % Proceso iterativo

[f g H] = fun(x);[R npd] = chol(H); % Cholesky comprueba si H es definida positivaif ~npd

p = -R\(R’\g); % Dirección de Newton_mpelse

p = -g; % Dirección de máxima pendienteendif abs(p’*g)<1e-8, break, endalpha = 1;for k=1:10 % Backtracking de amplitud de paso

xnew = x+alpha*p;fxnew = fun(xnew);if fxnew < f+alpha*rho*g’*p, breakelse alpha=alpha*beta;end

endx = x + alpha*p;fprintf(’%3.0f %13.5e %13.5e %13.5e %13.5e %2.0f\n’,i,x,f,alpha,npd);

endend

function [f g H] = objfun_min3(x)% Función de Rosenbrock: 100(x2-x1^2)^2+(1-x1)^2global hf = 100*(x(2)-x(1)^2)^2+(1-x(1))^2; if nargout<2, return, endg = [-200*(x(2)-x(1)^2)*2*x(1)-2*(1-x(1));200*(x(2)-x(1)^2)];if nargout<3, return, endx1=[x(1)+h;x(2)]; [f1 g1]=objfun_min3(x1);H(1:2,1)=(g1-g)/h;x1=[x(1);x(2)+h]; [f1 g1]=objfun_min3(x1);H(1:2,2)=(g1-g)/h;

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

30/63� Si lo utilizamos para resolver la complicada función de Rosenbrock

minimizarx2R2

100�x2 � x21

�2 C .1 � x1/2:

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

x1

x 2

1

1

1

1

2

2

2

2

22

2

3

33

3

3

3

3

3

4

4 4

4

4

4

4

4

5

55

5

5

5 5

5

10

10

10

10

1010

10

15

15

15

15

1515

15

Figure 5: Contour plot of Rosenbrock’s banana function.

where K is the cone of 6 × 6 PSD matrices. Following the terminology introduced in[7, 8], the above matrix is referred to as the moment, or measure matrix associated withthe LMI relaxation. Because the above moment matrix contains relaxations of monomialsof degree up to 2+2=4, it is referred to as the second-degree moment matrix. The aboveupper-left 3x3 submatrix contains relaxations of monomials of degree up to 1+1=2, so itis referred to as the first-degree moment matrix.

Now replacing the monomials in the criterion by their relaxed variables, the first LMIrelaxation of Rosenbrock’s banana function minimization reads

max −1 + 2y10 − y20 − 100y02 + 200y21 − 100y40

s.t.

1 y10 y01 y20 y11 y02y10 y20 y11 y30 y21 y12y01 y11 y02 y21 y12 y03y20 y30 y21 y40 y31 y22y11 y21 y12 y31 y22 y13y02 y12 y03 y22 y13 y04

∈ K.

For a comprehensive description of the way LMI relaxations are build (relaxations ofhigher orders, moment matrices of higher degrees and moment matrices associated withconstraints), the interested reader is advised to consult [7, 8]. All we need to know hereis that an LMI relaxation of a non-convex optimization problem can be expressed as a

17

function [f g H] = objfun_min3(x)% Función de Rosenbrock: 100(x2-x1^2)^2+(1-x1)^2global hf = 100*(x(2)-x(1)^2)^2+(1-x(1))^2; if nargout<2, return, endg = [-200*(x(2)-x(1)^2)*2*x(1)-2*(1-x(1));200*(x(2)-x(1)^2)];if nargout<3, return, endx1=[x(1)+h;x(2)]; [f1 g1]=objfun_min3(x1);H(1:2,1)=(g1-g)/h;x1=[x(1);x(2)+h]; [f1 g1]=objfun_min3(x1);H(1:2,2)=(g1-g)/h;end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

31/63

� Partiendo de un punto alejado se obtiene esto.

>> [x f k] = Newton_mp(@objfun_min3,[0;15])1 1.56250e-002 -8.43750e+000 2.25010e+004 7.81250e-003 12 1.62080e-002 2.62358e-004 7.12052e+003 1.00000e+000 03 2.62139e-001 8.23454e-003 9.67847e-001 2.50000e-001 04 3.18480e-001 9.82550e-002 9.10251e-001 1.00000e+000 05 5.26915e-001 2.32607e-001 4.65478e-001 5.00000e-001 06 5.74193e-001 3.27462e-001 4.26601e-001 1.00000e+000 07 7.21323e-001 4.97541e-001 1.81811e-001 5.00000e-001 08 7.71508e-001 5.92706e-001 1.29485e-001 1.00000e+000 09 8.47484e-001 7.11197e-001 5.28430e-002 5.00000e-001 0

10 9.10865e-001 8.25658e-001 2.82056e-002 1.00000e+000 011 9.60290e-001 9.19713e-001 9.55884e-003 1.00000e+000 012 9.86967e-001 9.73391e-001 2.17363e-003 1.00000e+000 013 9.98376e-001 9.96624e-001 2.20516e-004 1.00000e+000 014 9.99959e-001 9.99915e-001 4.33199e-006 1.00000e+000 0

x =0.9999587782014350.999915052848790

f =2.326866095161381e-009

k =15

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

32/63

Métodos de Newton Amortiguado y deRegión de Confianza

� Supongamos que un modelo M.p/ representa bien1 el comportamiento de lafunción f que queremos minimizar en el entorno de un punto, es decir,

f .x C p/ 'M.p/ defD f .x/C pT c C 12

pTBp;

donde c 2 Rn y la matriz B 2 Rn�n es simétrica.

� La idea es utilizar esos modelos para calcular direcciones de descenso que sepuedan modular a voluntad, y con amplitudes de paso D 1.Estudiaremos dos variantes.

1Tal modelo puede ser perfectamente el desarrollo en serie de Taylor hasta segundas derivadas, o una apro-ximación adecuada.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

33/63

� La primera se conoce como Newton amortiguado. Calcula en cada punto delproceso iterativo un paso

pNa

defD mKınp

�M.p/C 1

2�pTp

�;

donde � es un parámetro de amortiguación que, aplicado como se indica,penaliza pasos amplios.

� La segunda, la región de confianza,

prc

defD mKınkpk � �

M.p/;

pues se restringe el modelo a una región donde se confía que plasme bien lascaracterísticas del problema.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

34/63� Si la función decrece convenientemente en cualquiera de estas direcciones, sehace x C p como el nuevo punto del procedimiento y se adapta � o �.

� Si no decrece lo suficiente, y no se ha llegado al óptimo, se modifica � o � paraque en la próxima iteración el avance hacia el óptimo mejore.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

41/85

C H A P T E R 4 . T R U S T - R E G I O N M E T H O D S 67

minimizer. In general, the direction of the step changes whenever the size of the trust regionis altered.

The size of the trust region is critical to the effectiveness of each step. If the region istoo small, the algorithm misses an opportunity to take a substantial step that will move itmuch closer to the minimizer of the objective function. If too large, the minimizer of themodel may be far from the minimizer of the objective function in the region, so we may haveto reduce the size of the region and try again. In practical algorithms, we choose the size ofthe region according to the performance of the algorithm during previous iterations. If themodel is consistently reliable, producing good steps and accurately predicting the behaviorof the objective function along these steps, the size of the trust region may be increased toallow longer, more ambitious, steps to be taken. A failed step is an indication that our modelis an inadequate representation of the objective function over the current trust region. Aftersuch a step, we reduce the size of the region and try again.

Figure 4.1 illustrates the trust-region approach on a function f of two variables inwhich the current point xk and the minimizer x∗ lie at opposite ends of a curved valley.The quadratic model function mk , whose elliptical contours are shown as dashed lines, isconstructed from function and derivative information at xk and possibly also on informationaccumulated from previous iterations and steps. A line search method based on this modelsearches along the step to the minimizer of mk (shown), but this direction will yield at mosta small reduction in f , even if the optimal steplength is used. The trust-region methodsteps to the minimizer of mk within the dotted circle (shown), yielding a more significantreduction in f and better progress toward the solution.

In this chapter, we will assume that the model function mk that is used at eachiterate xk is quadratic. Moreover, mk is based on the Taylor-series expansion of f around

kcontours of

contours of f

Trust region step

Trust region

m

Line search direction

Figure 4.1 Trust-region and line search steps.

Región de confianzaPaso de Newton

Contornos del modelo M.p/

Contornos de f .x/Paso de región deconfianza

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

35/63

� La calidad del modelo elegido se evalúa mediante la ganancia

% D f .x/ � f .x C p/

M.0/ �M.p/ ;

es decir, la relación entre lo que decrece la función desde x y lo que preveía elmodelo que lo haría.

� Con el modelo región de confianza, si % < 14, la ganancia es pequeña y se

deberían reducir los pasos, por ejemplo a la mitad, mientras que si % > 34se

podrían aumentar.

if % < 0;25� WD �=2

elseif % > 0;75� WD mKax f�; 3 � kpkg

end

Los factores 2 y 3 de esta estrategia no deben hacer oscilar la región �.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

36/63

� Si con el modelo Newton amortiguado % es pequeño, se debería aumentar elparámetro de amortiguación, �, aumentando así la penalización por dar pasosgrandes. Un % grande, por el contrario, indicará que M.p/ es una buenaaproximación de f .x C p/ y se puede reducir �.

if % < 0;25� WD � � 2

elseif % > 0;75� WD �=3

end

� Otra estrategia para � es:

if % > 0� WD � �mKax

˚13; 1 � .2% � 1/3

else� WD � � 2

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

37/63

Newton amortiguado. Cálculo de la dirección

� Se calcula determinando

.p/ D mKınp

�M.p/C 1

2�pTp

�:

� La condición necesaria de mínimo, r .p�/ D 0, hace que la dirección pNa seala solución de

0.h/ DM 0.h/C �p D 0:

lo que es equivalente, de acuerdo con la definición de M.p/ a

.B C �I/pNa D �c:

Si � es suficientemente grande, la matriz simétrica B C �I es definidapositiva, por lo que pNa es un mínimo del modelo M.p/.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

38/63

� En Newton amortiguado B D H y c D g, y el sistema es

.H C �I/pNa D �g W

8ˆ<ˆˆ:

Si � es suficientemente grande,

pNa � �1

�g;

por lo que la dirección es próxima a la demáxima pendiente.

Si � es pequeño, la dirección es casi la deNewton.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

39/63

� Algoritmo de Newton amortiguado para minimizar f .x/ y en Matlab::

Dados Un x WD x0 y tol . Hacer � D 1; � D 2 y found WD f

while (not found) and (k < kmax)Resolver dirección de descenso .H C�I/p D �g


elseCalcular % D .f .x/� f .xCp// = .M.0/�M.p//if % > 0� WD � �mKax

˚13; 1� .2%� 1/3

x WD xCp

else� WD � � 2

endendk WD kC 1

end

function [x f k] = Newton_amortiguado(fun,x)% Método de Newton amortiguado con paso completoglobal hk=0; kmax=500; eps1=1e-9; eps2=1e-12; n=length(x); h=eps1;[f g H] = fun(x); ng = norm(g,inf); mu = 1; found = ng<=eps1;while ~found

posdef = 0;while ~posdef

[R p] = chol(H+mu*eye(n)); % Se comprueba si H es def. positivaif p, mu=2*mu; else posdef = 1; end

endp = -R\(R’\g); nh = norm(p); % Cálculo de pasoif norm(g,inf)<eps1 || nh <= eps2*(eps2 + norm(x)), found=2; % Óptimoelse % Modificar región de confianza

xnew = x + p; dL = -p’*g-p’*(H*p)/2;[fn gn Hn] = fun(xnew); df = f-fn; r=df/dL;if r>0

x = xnew; f = fn; g = gn; H = Hn;ng = norm(g,inf);mu = mu*max(1/3,1-(2*r-1)^3);

elsemu = mu*2;

endk = k + 1;if ng <= eps1, found = 1; elseif k > kmax, found = 3; endfprintf(’%4.0f %13.8e %13.8e %13.8e %13.8e\n’,k,x,f,mu);

endend

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

40/63� Resolvamos con este algoritmo

minimizarx2R2

0;5x21.x21=6C 1/C x2 arctan.x2/ � 0;5 ln.x22 C 1/

>> [x f k] =Newton_amortiguado(@fun5_2,[2;2])2 1.22222222e+000 1.07737607e+000 1.43392805e+000 3.33333333e-0013 5.74640142e-001 4.41028668e-002 1.75164488e-001 1.11111111e-0014 1.32066691e-001 4.36656594e-003 8.75568971e-003 3.70370370e-0025 6.09495146e-003 1.55898129e-004 1.85864837e-005 1.23456790e-0026 7.44750479e-005 1.90119424e-006 2.77507370e-009 4.11522634e-0037 3.05225879e-007 7.79177968e-009 4.66421303e-014 1.37174211e-0038 4.18117642e-010 1.06736708e-011 8.75251085e-020 4.57247371e-004

x =1.0e-009 *0.4181176418942180.010673670797581

f =8.752510847988606e-020

k =8

function [f g H]=fun5_2(x)f=0.5*x(1)^2*(x(1)^2/6+1)+x(2)*atan(x(2))-0.5*log(x(2)^2+1);g=[x(1)^3/3+x(1); atan(x(2))];H(1,1)=x(1)^2+1;H(1,2)=0;H(2,1)=0;H(2,2)=1/(1+x(2)^2);

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

41/63

Región de confianza. Cálculo de la dirección

Teorema El vector p� es la solución de

prc

defD mKınkpk��

M.p/defD f .x/C gTp C 1

2pTBp

si y sólo si p� es factible y existe un escalar � � 0 tal que se cumple que

1. .B C �I/p� D �g,

2. �.� � kp�k/ D 0 y

3. .B C �I/ es semidefinida positiva.

B es la matriz Hessiana, r2f .x/, o una aproximación.

� Este es un problema de optimización con condiciones. La variable � es elmultiplicador de Lagrange de la condición que impone la región de confianza.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

42/63� La segunda condición, de complementariedad, establece que � D 0 okp�k D �.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

49/80

70 C H A P T E R 4 . T R U S T - R E G I O N M E T H O D S

Theorem 4.1.The vector p∗ is a global solution of the trust-region problem

minp∈IRn

m(p) � f + gT p + 12 pT Bp, s.t. ‖p‖ ≤ �, (4.7)

if and only if p∗ is feasible and there is a scalar λ ≥ 0 such that the following conditions aresatisfied:

(B + λI )p∗ � −g, (4.8a)

λ(�− ||p∗||) � 0, (4.8b)

(B + λI ) is positive semidefinite. (4.8c)

We delay the proof of this result until Section 4.3, and instead discuss just its keyfeatures here with the help of Figure 4.2. The condition (4.8b) is a complementarity conditionthat states that at least one of the nonnegative quantities λ and (� − ‖p∗‖) must be zero.Hence, when the solution lies strictly inside the trust region (as it does when � � �1 inFigure 4.2), we must have λ � 0 and so Bp∗ � −g with B positive semidefinite, from (4.8a)and (4.8c), respectively. In the other cases � � �2 and � � �3, we have ‖p∗‖ � �, andso λ is allowed to take a positive value. Note from (4.8a) that

λp∗ � −Bp∗ − g � −∇m(p∗).

m

1

contours of

*3p

∆

∆

∆

2

3

p *2p*1

Figure 4.2 Solution of trust-region subproblem for different radii �1, �2, �3.

Contornos del modelo M.p/

� Indica que cuando p� está estrictamente dentro de la región deconfianza (como es el caso en la figura de � D �1), se debecumplir que � D 0 y por tanto que Bp� D �g, siendo B

semidefinida positiva.

Indica que cuando p� está estrictamen-te dentro de la región de confianza (co-mo es el caso en la figura de � D �1),se debe cumplir que � D 0 y por tantoque Bp� D �g, siendo B semidefinidapositiva.

� Cuando � D �2 y � D �3, se tiene que kp�k D � y � será > 0.

� De la primera condición se deduce que �p� D �Bp� � g D �rM.p�/. Esdecir, cuando � > 0, la solución p� es colineal con el negativo del gradiente deM (máxima pendiente) y perpendicular a sus perfiles de contorno, como en lafigura.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

43/63

� Las soluciones de los subproblemas M.p/ no tienen por qué ser exactas. Existenvarias formas de aproximarlas:

Punto de Cauchy : Mínimo a lo largo de p D �g, acotada a �

Dogleg (pata de perro): si B es definida positiva

Minimización en un subespacio de dimensión 2 : Si B es indefinida

Steihaug : Si B es simétrica

Otras: : :

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

44/63Punto de Cauchy

� La dirección es la solución del modelo Mk.p/ de máxima pendiente:

pck D ��k

�k

kgkkgk;

donde

�k D8<:1 si gTkBgk � 0

mKın�

kgkk3.�kgT

kBkgk/

; 1

�si gTkBgk > 0:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

52/78


It is easy to write down a closed-form definition of the Cauchy point. For a start, thesolution of (4.9) is simply

pSk � − �k

‖gk‖gk .

To obtain τk explicitly, we consider the cases of gTk Bk gk ≤ 0 and gT

k Bk gk > 0 separately. Forthe former case, the function mk(τ pS

k) decreases monotonically with τ whenever gk � 0,so τk is simply the largest value that satisfies the trust-region bound, namely, τk � 1. Forthe case gT

k Bk gk > 0, mk(τ pSk) is a convex quadratic in τ , so τk is either the unconstrained

minimizer of this quadratic, ‖gk‖3/(�k gTk Bk gk), or the boundary value 1, whichever comes

first. In summary, we have

pCk � −τk

�k

‖gk‖gk, (4.11)

where

τk �{

1 if gTk Bk gk ≤ 0;

min(‖gk‖3/(�k gT

k Bk gk), 1)

otherwise.(4.12)

Figure 4.3 illustrates the Cauchy point for a subproblem in which Bk is positivedefinite. In this example, pC

k lies strictly inside the trust region.The Cauchy step pC

k is inexpensive to calculate—no matrix factorizations arerequired—and is of crucial importance in deciding if an approximate solution of thetrust-region subproblem is acceptable. Specifically, a trust-region method will be globally

—

kC

mk

gk

Trust region

contours of

p

Figure 4.3 The Cauchy point.

Contornos del modelo Mk.p/

Región de confianza

�gk

pck

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

45/63DogLeg

� Hemos visto que si la matriz B del modelo M.p/ es definida positiva, el mínimode éste hace pB D �B�1g. Es decir, cuando � � kpBk, p�.�/ D pB .

� Si � es pequeña con respecto al paso pB , la condición kpk � � hace que eltérmino cuadrático del modelo M.p/ tenga poco efecto en su solución.En este caso, se puede aproximar p�.�/ � �� g

kgk:

� Para otros valores la solución p�.�/ sigue una trayectoria curvilínea:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

56/81


)∆

pB full step( )

—g)pU

—g

Trust region

pOptimal trajectory

dogleg path

unconstrained min along(

(

Figure 4.4 Exact trajectory and dogleg approximation.

by simply omitting the quadratic term from (4.5) and writing

p∗(�) ≈ −�g

‖g‖ , when � is small. (4.14)

For intermediate values of �, the solution p∗(�) typically follows a curved trajectory likethe one in Figure 4.4.

The dogleg method finds an approximate solution by replacing the curved trajectoryfor p∗(�) with a path consisting of two line segments. The first line segment runs from theorigin to the minimizer of m along the steepest descent direction, which is

pU � − gT g

gT Bgg, (4.15)

while the second line segment runs from pU to pB (see Figure 4.4). Formally, we denote thistrajectory by p(τ ) for τ ∈ [0, 2], where

p(τ ) �{

τ pU, 0 ≤ τ ≤ 1,

pU + (τ − 1)(pB − pU), 1 ≤ τ ≤ 2.(4.16)

The dogleg method chooses p to minimize the model m along this path, subject tothe trust-region bound. The following lemma shows that the minimum along the doglegpath can be found easily.

Trayectoria óptima de p.�/

Región de confianza

�g

pB (paso completo)pU : mínimo en la dirección �g

dirección DogLeg

4El método DogLeg reemplaza la trayectoria curvilínea por dos segmentos delínea recta. El primero hasta el mínimo de M .p/ en la dirección de máximapendiente

pU D � gTg

gTBgg:

El método DogLeg reemplaza la trayectoria curvilíneapor dos segmentos de recta. El primero hasta el míni-mo de M .p/ en la dirección de máxima pendiente

pU D � gTg

gTBgg:

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

46/63� El segundo desde pU hasta pB . En conjunto a esta trayectoria se le denominaQp.�/, con � 2 Œ0; 2�, donde Qp.�/ D

��pU 0 � � � 1pU C .� � 1/ �pB � pU

�1 � � � 2:

� La intersección de esta trayectoria con la región de confianza es el punto que sebusca; concretamente el � que resuelve la ecuación pU C .� � 1/ �pB � pU

� 2 D �2:

�Algoritmo de minimización de una función median-te el método de región de confianza DogLeg conmodelo de Newton.

Dados Un x WD x0 y tol . Hacer � D 1 y found WD false

while (not found) and (k < kmax)Resolver H pN D �gif pN � �, x WD x C pN

else pmp D �.gTg=gTH g/gif pmp > �, x WD x C pc

elseCalcular dir. DogLeg e intersección �x WD x C pmp C �.pN � pmp/

endendCalcular % D .f .x/ � f .x C p// = .M.0/ �M.p//if % < 0;25, � WD �=2elseif % > 0;75, � WD mKaxf� � 2;�maxgendk WD k C 1, if found , exit

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

47/63� Un código de Matlab:

function [xc fc] = Dogleg_UBC_yo(fun,x0)% dogleg.m -- Método de Región de Confianza con modelo Newtoneps1=100*sqrt(eps); eps2=sqrt(eps); Kmax=200; Delta=0.5; Dmax=100;xstring = ’’;for jj=1:length(x0)

xstring = [xstring,’x_k(’,int2str(jj),’) ’];endfprintf([’\n k %s f(x_k) Delta |dx|/Del ro ’,...

’ e_1 e_2\n’],xstring);xc = x0; [fc gc Hc] = fun(xc);disp([’ 0’, sprintf(’ %12.4e’,[xc’,fc]),sprintf(’ %10.2e’,Delta)]);for k=1:Kmax % Modelo Newton: -(gc’*pN + 0.5*pN’*Hc*pN);

pN = - Hc\gc; % Dirección de NewtonpNlen = (pN’*pN)^0.5; gHg = gc’*Hc*gc;if pNlen <= Delta % Punto Newton dentro región

xt = xc + pN;else % Fuera: encontrar el mínimo mediante máxima

% pendiente del Model NewtonpSD = -(gc’*gc)/gHg*gc;pSDlen = (pSD’*pSD)^0.5;if pSDlen>=Delta % Newton y mp fuera región: usa punto Cauchy

xt = xc - Delta*gc/(gc’*gc)^0.5;else % mp dentro región y Newton fuera: usa DogLeg

% en límite región de linea uniéndolospN_pSD = pN-pSD;a = pN_pSD’*pN_pSD; b = 2*pN_pSD’*pSD; c = pSD’*pSD - Delta^2;t = (-b+(b^2-4*a*c)^0.5)/2/a;xt = xc + pSD + t*pN_pSD;

endend[fn gn Hn] = fun(xt); % Nuevo punto obtenido en Reg. Con.dx = xt - xc; df = fn - fc;redfun = -df; % Reducción obtenida en funciónrepred = -gc’*dx-0.5*dx’*Hc*dx; % Reducción predicha modelo Newton

rho = redfun/repred; % Gananciae1 = max(abs(pN’)./max([abs(xc’);eps2*ones(size(xc’))]));e2 = max((abs(gn).*abs(xt))/ max([abs(fn),eps2]));if rho>0 % Buena ganancia; aceptar punto

xn=xt; xc=xt; fc=fn; gc=gn; Hc=Hn;if rho<0.25, Delta = 0.25*norm(dx); endif rho>0.75 && norm(dx,2)>((1-eps)*Delta),

Delta = min([2*Delta;Dmax]);enddisp([sprintf(’%3d’,k), sprintf(’ %12.4e’,[xn’,fn]),...sprintf(’ %10.2e’,Delta),sprintf(’ %6.4f’,norm(dx)/Delta),...sprintf(’ %7.3f’,rho),sprintf(’ %8.1e’,e1),...sprintf(’ %8.1e’,e2)]);if e1<eps1 || e2<eps2, break, end

elsexn = xc; % Reducción de función mala; seguir igual

endenddisp([’Óptimo: x’’ =’,sprintf(’ %22.16e’,xc),’.’]);disp([’Función: f(x) =’,sprintf(’ %22.16e’,fc),’.’]);

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

48/63

� Resolvamos con este procedimiento

minimizarx2R2

0;5x21.x21=6C 1/C x2 arctan.x2/ � 0;5 ln.x22 C 1/

>> [x f]=Dogleg_UBC_yo(@fun5_2,[2;2]);

k x_k(1) x_k(2) f(x_k) Delta |dx|/Del ro e_1 e_20 2.0000e+000 2.0000e+000 4.7429e+000 5.00e-0011 1.5135e+000 1.8846e+000 2.8658e+000 1.00e+000 0.5000 1.040 2.8e+000 1.4e+0002 5.8687e-001 1.5086e+000 1.0754e+000 2.00e+000 0.5000 1.233 2.6e+000 1.4e+0003 -4.1104e-001 -2.2464e-001 1.1188e-001 4.00e+000 0.5000 0.781 2.1e+000 1.6e+0004 -3.9606e-002 7.4824e-003 8.1253e-004 4.00e+000 0.1095 1.045 1.0e+000 1.9e+0005 -4.1354e-005 -2.7927e-007 8.5513e-010 4.00e+000 0.0101 1.001 1.0e+000 1.1e-0016 -4.7149e-014 1.4505e-020 1.1115e-027 4.00e+000 0.0000 1.000 1.0e+000 1.5e-019

Óptimo: x’ = -4.7148815071357958e-014 1.4505439221729893e-020.Función: f(x) = 1.1115053813167661e-027.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

49/63Método de los gradientes conjugados

� La idea es extender a problemas no lineales generales el método que vimos paraminimizar funciones cuadráticas2.

� Fue propuesto por los profesores británicos R. Fletcher y C. Reeves en 1964.

Roger Fletcher, Reino Unido, 1939-.

� La dirección de descenso de este método es

p D �rf .x/C ˇppr

La nueva dirección p y la previa, ppr , han de ser conjugadas con respecto a lamatriz Hessiana.

2Y resolver así sistemas de ecuaciones lineales.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

50/63� El parámetro ˇ se determina de tal manera que minimice una forma cuadráticade la matriz Hessiana pues suficientemente cerca del óptimo las funcionescontinuas se aproximan muy bien a formas cuadráticas.


!


!


!


� La primera fórmula para ˇ fue sugerida por Fletcher y Reeves:

ˇ D rf .x/Trf .x/rf .xpr/Trf .xpr/.

� La de Polak y Ribière, posteriormente, en 1971:

ˇ D�rf .x/ � rf .xpr/�T rf .x/rf .xpr/Trf .xpr/

. Elijah Polak, Polonia,1931-.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

51/63

� Algoritmo de los gradientes conjugados para minimizar f .x/:

Dados La función f .x/, un punto de partida x0 y una tolerancia.

Calcular ppr D �rf .x0/Repetir Mientras la aproximación a la solución > tolerancia:

1. Calcular dir. de descenso: p D �rf .x/C ˇppr ,

con ˇ D rf .x/Trf .x/rf .xpr/Trf .xpr/

o ˇ D�rf .x/ � rf .xpr/�T rf .x/rf .xpr/Trf .xpr/

.

2. Calcular la amplitud de paso, ˛, en esa dirección.

3. Calcular el nuevo punto x WD x C ˛p.

� Ninguna de las dos fórmulas de ˇ requiere calcular la matriz Hessiana.

� Si la función a minimizar es cuadrática las dos fórmulas estudiadas coinciden.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

52/63

� Calculemos para probar estos métodos el mínimo de la función de Rosenbrock:

minimizarx2R2

100�x2 � x21

�2 C .1 � x1/2:

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

x1

x 2

1

1

1

1

2

2

2

2

22

2

3

33

3

3

3

3

3

4

4 4

4

4

4

4

4

5

55

5

5

5 5

5

10

10

10

10

1010

10

15

15

15

15

1515

15

Figure 5: Contour plot of Rosenbrock’s banana function.

where K is the cone of 6 × 6 PSD matrices. Following the terminology introduced in[7, 8], the above matrix is referred to as the moment, or measure matrix associated withthe LMI relaxation. Because the above moment matrix contains relaxations of monomialsof degree up to 2+2=4, it is referred to as the second-degree moment matrix. The aboveupper-left 3x3 submatrix contains relaxations of monomials of degree up to 1+1=2, so itis referred to as the first-degree moment matrix.

Now replacing the monomials in the criterion by their relaxed variables, the first LMIrelaxation of Rosenbrock’s banana function minimization reads

max −1 + 2y10 − y20 − 100y02 + 200y21 − 100y40

s.t.

1 y10 y01 y20 y11 y02y10 y20 y11 y30 y21 y12y01 y11 y02 y21 y12 y03y20 y30 y21 y40 y31 y22y11 y21 y12 y31 y22 y13y02 y12 y03 y22 y13 y04

∈ K.

For a comprehensive description of the way LMI relaxations are build (relaxations ofhigher orders, moment matrices of higher degrees and moment matrices associated withconstraints), the interested reader is advised to consult [7, 8]. All we need to know hereis that an LMI relaxation of a non-convex optimization problem can be expressed as a

17

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

53/63

� Utilicemos para ello este programa en Matlab.

function [x f] = Grad_Conjugados_unc(fun,x,par)% Método de los gradientes conjugados para minimizar f(x)rho = 0.01; beta = 0.1; % Parámetros de line searchmaxit = 1000;[f g]= fun(x); ppr=-g; gpr=g; pbeta=0; % Comienzo con máx. pendientefor i=1:maxit % Proceso iterativo

if i>1[f g] = fun(x);if par==1, pbeta=(g’*g)/(gpr’*gpr); % Fletcher-Reeveselse pbeta=((g-gpr)’*g)/(gpr’*gpr); % Polak-Ribièreend

endp = -g+pbeta*ppr;if (norm(g,inf) < 1e-6), break, end % Condición de mínimoalpha = 1;for k=1:10 % Backtracking amplitud de paso

xnew = x+alpha*p; fxnew = fun(xnew);if fxnew <= f + alpha*rho*g’*p

breakelse alpha = alpha*beta;end

endx = x + alpha*p; ppr=p; gpr=g;fprintf(’%4.0f %13.8e %13.8e %13.8e %13.8en’,i,x,f,alpha);

endend

function [f g] = objfun_min3(x) % Rosenbrockf = 100*(x(2)-x(1)^2)^2+(1-x(1))^2;if nargout<2, return, endg = [-200*(x(2)-x(1)^2)*2*x(1)-2*(1-x(1));200*(x(2)-x(1)^2)];

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

54/63

� Se parte del punto x D Œ�1;2 1�. El óptimo es x� D Œ1 1�. La función y elcamino hacia el óptimo son los de la figura.

35 4. CONJUGATEGRADIENT METHODS

4.5. Convergence PropertiesIn Theorem 4.8 we saw that the search directionshcg of a conjugate gradi-ent method are descent directions and thus theθ of (2.12) satisfiesθ<π/2.There is no guarantee, however, that theµ of Definition 2.13 will stay con-stant, and Theorem 2.18 is therefore not directly applicable.

For many years it was believed that to guarantee convergence of a conjugategradient method it would be necessary to use a complicated ad hoc linesearch, and perhaps make some other changes to the method. But in 1985Al-Baali managed to prove global convergence using a traditional soft linesearch.

Theorem 4.14.Let the line search used in Algorithm 4.6 satisfy (2.16)and (2.17) with parameter values%<β < 0.5. Then there is ac> 0such that for allk

f ′(x)>hcg ≤ −c ‖f ′(x)‖22 andlimk→∞

‖f ′(x)‖2 = 0 .

Proof. See Al-Baali (1985).

Let us finally remark on the rate of convergence. Crowder and Wolfe (1972)show that conjugate gradient methods with exact line search have a linearconvergence rate, as defined in (2.4). This should be contrasted with thesuperlinear convergence rate that holds for Quasi-Newton methods and thequadratic convergence rate that Newton’s method possesses.

Example 4.3. Rosenbrock’s function,

f(x) = 100(x2 − x21)

2+ (1− x1)2 ,

is widely used for testing optimization algorithms. Figure 4.2 shows level curvesfor this function (and illustrates, why it is sometimes called the“banana func-tion” ).

4.5. Convergence Properties 36

−1.5 1.5

−0.5

2

x1

x2

300

100

30

103

10.3

Figure 4.2:Contours of Rosenbrock’s function.

The function has one minimizerx∗ = [1, 1]> (marked by a+ in the figure) withf(x∗) = 0, and there is a “valley” with sloping bottom following the parabolax2 = x2

1. Most optimization algorithms will try to follow this valley. Thus,a considerable amount of iteration steps is needed, if we takex0 in the 2ndquadrant.

Below we give the number of iteration steps and evaluations off(x) andf ′(x)when applying Algorithm 4.6 on this function. In all cases we use the startingpointx0 = [−1.2, 1 ]>, and stopping criteria given byε1 = 10−8, ε2 = 10−12

in (4.7). In case of exact line search we useτ = 10−6, ε = 10−6 in (2.29),while we takeβ = 10−1, % = 10−2 in Algorithm 2.27 for soft line search.

Method Line search # it. steps # fct. evalsFletcher–Reeves exact 118 1429Fletcher–Reeves soft 249 628Polak–Ribiere exact 24 266Polak–Ribiere soft 45 130

Thus, in this case the Polak–Ribiere method with soft line search performsbest. Below we give the iterates (cf. Figure 4.2) and the values off(xk) and‖f ′(xk)‖∞; note the logarithmic ordinate axis.

37 4. CONJUGATEGRADIENT METHODS

−1.2 1

1

x1

x2

0 10 20 30 40 501e−15

1e−10

1e−5

1

f||f’||

Figure 4.3:Polak–Ribiere method with soft line searchapplied to Rosenbrock’s function.Top: iteratesxk. Bottom:f(xk) and‖f ′(xk)‖∞.

4.6. ImplementationTo implement a conjugate gradient algorithm in a computer program, somedecisions must be made. Of course we need to choose a formula forγ; werecommend the Polak–Ribiere formula.

We also need to specify the exactness of the line search. For Newton-typemethods it is usually recommended that the line search be quite liberal, sofor the line search in Algorithm 2.27 it is common to choose the parame-ter values%= 0.01 andβ= 0.9. For conjugate gradient methods experiencedictates that a line search with stricter tolerances be used, say%= 0.01 andβ= 0.1. In addition we have to specify the stopping criterion; (2.9) is rec-ommended. Since we do not have access tof ′′(xk), we cannot use (2.10).For methods with a fast convergence rate, (2.8) may be quite satisfactory, but

4.7. CG Method for Linear Systems 38

its use for conjugate gradient methods must be discouraged because their fi-nal convergence rate is only linear.

Finally some remarks on the storage of vectors. The Fletcher–Reevesmethod may be implemented using threen-vectors of storage,x , g andh.If these containx, f ′(x) andhprev at the beginning of the current iterationstep, we may overwriteh with hcg and during the line search we overwritex with x+αhcg andg with f ′(x+αhcg). Before overwriting the gradient,we find f ′(x)

>f ′(x) for use in the denominator in (4.11) on the next iter-

ation. For the Polak–Ribiere method we need acces tof ′(x) andf ′(xprev)simultaneously, and thus four vectors are required, sayx , g, gnew andh.

4.7. The CG Method for Linear SystemsWe cannot part with conjugate gradient methods without mentioning thatthey can of course be used to minimize the quadratic function (4.2) itself.But by (4.3) this is equivalent to solving the positive definite linear system

Hx = −b .

Let g denote the current gradient,

g = q′(x) = Hx + b ,

and letu = H hcg. It is easily verified that the exact step lengthα may becalculated directly,

α =−u>g

u>hcg,

and thatx andg are updated by

x := x + αhcg, g := g + αu .

The Fletcher–Reeves and the Polak–Ribiere formulas are equivalent in thissetting,

γ =g>g

g>prevgprev.

Thus, the method can be implemented using fourn-vectors,x , g, h, u.

� Con la tolerancia y los parámetros dados, la convergencia es esta.

Método Número iteracionesPolak–Ribière 103Fletcher–Reeves 514

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

55/63

� Probemos:» [x f] = Grad_Conjugados_unc(@objfun_min3,[-1.2;1],1)

1 -9.84400000e-001 1.08800000e+000 2.42000000e+001 1.00000000e-0032 -1.01771362e+000 1.06810987e+000 5.35291158e+000 1.00000000e-003..

513 1.00000141e+000 1.00000283e+000 2.00451765e-012 1.00000000e-004x =

1.0000014147806741.000002834445620

f =2.003988012156511e-012

» [x f] = Grad_Conjugados_unc(@objfun_min3,[-1.2;1],2)1 -9.84400000e-001 1.08800000e+000 2.42000000e+001 1.00000000e-0032 -9.84399999e-001 1.08800000e+000 5.35291158e+000 1.00000000e-010..

102 1.00000121e+000 1.00000243e+000 1.47631065e-012 1.00000000e-003x =

1.0000012126798991.000002430217752

f =1.472951080761591e-012

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

56/63

Métodos cuasi Newton

� Son algoritmos basados en el método de Newton que calculan una aproximaciónde la matriz hessiana, o variantes de ella, mediante fórmulas de recurrencia.

� Si la fórmula de la dirección de descenso de Newton es

p D � �r2f .x/��1rf .x/se utiliza

p D �Hrf .x/;donde H es una aproximación de la matriz inversa de la hessiana.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

57/63� Las fórmulas más comunes de H son:

DFP De Davidon, Fletcher y Powell:

H kC1 D H k �H kyky

Tk H k

yTk H kykC sks

Tk

yTk sk:

Quasi-Newton iterations for optimizationThe field was launched between 1959 and 1970.

William Davidon 1927-1954 PhD in Physics, U. Chicago1959: “variable metric” report at Argonne National Lab.

(It was finally published in 1991, first issue of SIOPT )

1961-1991: Prof. of Physics and Maths, Haverford Coll

Michael Powell 1936-1959-1976 Harwell A.E.R.E.1976- DAMTP, U. of Cambridge1976- DAMTP, U. of Cambridge1983 FRS

Charles Broyden 1933-20111955-1965: English Electric1965: “good” and “bad” Broyden methods1967-1986 U. of Essex1990-2003 U. of Bologna

Roger Fletcher 1939-1969-1973 Harwell A.E.R.E…. U. of Leeds1963: Davidon-Fletcher-Powell paper1971-2005 U. of Dundee2003 FRS

Quasi-Newton iterations for optimizationThe field was launched between 1959 and 1970.

William Davidon 1927-1954 PhD in Physics, U. Chicago1959: “variable metric” report at Argonne National Lab.

(It was finally published in 1991, first issue of SIOPT )

1961-1991: Prof. of Physics and Maths, Haverford Coll

Michael Powell 1936-1959-1976 Harwell A.E.R.E.1976- DAMTP, U. of Cambridge1976- DAMTP, U. of Cambridge1983 FRS

Charles Broyden 1933-20111955-1965: English Electric1965: “good” and “bad” Broyden methods1967-1986 U. of Essex1990-2003 U. of Bologna

Roger Fletcher 1939-1969-1973 Harwell A.E.R.E…. U. of Leeds1963: Davidon-Fletcher-Powell paper1971-2005 U. of Dundee2003 FRS

William Davidon, Roger Fletcher yMichael Powell.

BFGS Formulada por Broyden, Fletcher, Goldfarb y Shanno:

H kC1DH kC�1C yTk H kyk

yTk sk

�sksTkyTksk��sky

Tk H k CH kyks

Tk

�yTk sk

;

Charles Broyden, Roger Fletcher,Donald Goldfarb y David Shanno.

donde yk D rf .xkC1/ � rf .xk/ y sk D xkC1 � xk D ˛kpk :

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

58/63

� Algoritmo para minimizar f .x/.

Dados f .x/, un x0, una tolerancia y H 0

Repetir Mientras no se satisfaga la tolerancia de solución:

1. Determinar la dirección de descenso:

Obtener pk D �H krf .xk/2. Calcular la amplitud de paso, ˛k , en esa dirección

3. Calcular el nuevo punto xkC1 WD xk C ˛kpk y H kC1

� En Matlab:

function [x,i,f,nfeval] = quasi_newton_1(fun,x,metodo)% Método cuasi Newtonrho = 0.01; beta = 0.1; % Parámetros de line search[f,g] = fun(x); H=eye(length(x)); eps1=1e-5; eps2=1e-8;maxit = 1000; nfeval = 1;for i=1:maxit

xp = x; gp = g;p = -H*g; nh = norm(p); % Nueva dirección: cuasi Newtonif norm(g,inf)<eps1 || nh <= eps2*(eps2 + norm(x)), break, endalpha = 1;for k=1:10 % Backtracking amplitud de paso

xnew = x+alpha*p;fxnew = fun(xnew); nfeval = nfeval+1;if fxnew < f+alpha*rho*g’*p, breakelse alpha=alpha*beta;end

endx = x + alpha*p; s = x - xp; ns = norm(s);[f,g] = fun(x);y = g - gp; ys = y’*s; nfeval = nfeval+1;if ys > sqrt(eps)*ns*norm(y) % Recalcular H

v = H*y; yv = y’*v;if metodo == 1, H=H - (v/yv)*v’ + (s/ys)*s’; % DFPelse H = H + (1+yv/ys)*(s*s’)/ys - (s*v’+v*s’)/ys; % BFGSend

endend

end

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

59/63

� Para obtener la solución habría que hacer algo parecido a lo que sigue.

>> [x i f nf]=quasi_newton_1(@objfun_min3,[-1.2;1],1)x =

0.9999866008195810.999973154300098

i =247

f =1.797638376605058e-010

nf =506

>> [x i f nf]=quasi_newton_1(@objfun_min3,[-1.2;1],2)x =

0.9999998731700810.999999758419957

i =40

f =3.067793488101831e-014

nf =88

� La función objetivo y su gradiente están en objfun_min3, ya utilizada.

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

60/63

� La sesión con el guión de Matlab fminunc sería la siguiente.

>> options = optimset(’Display’,’iter’,’GradObj’,’on’,’LargeScale’,’off’...,’HessUpdate’,’bfgs’);>> x0=[-1.2;1];>> [x,fval,exitflag,output] = fminunc(@objfun_min3,x0,options)


0 1 24.2 2161 3 4.28049 0.000861873 15.22 4 4.12869 1 33 5 4.12069 1 1.54 6 4.1173 1 1.625 7 4.08429 1 5.726 8 4.02491 1 10.47 9 3.9034 1 17.48 10 3.7588 1 20.19 11 3.41694 1 19.9

10 12 2.88624 1 11.911 13 2.4428 1 9.7812 14 1.93707 1 3.0113 17 1.64357 0.141994 5.5414 18 1.52561 1 7.5715 19 1.17013 1 4.5316 20 0.940886 1 3.1717 21 0.719844 1 5.1518 22 0.409164 1 5.7319 25 0.259747 0.0842377 5.01


20 26 0.238901 1 1.0621 27 0.2102 1 1.2222 29 0.18249 0.584225 3.3323 30 0.15856 1 2.9124 31 0.0893558 1 0.75625 33 0.0726393 0.438248 3.1826 34 0.0413887 1 2.327 35 0.0221877 1 0.49128 37 0.0126281 0.405833 1.9829 38 0.00703352 1 1.3530 39 0.00203299 1 0.19431 41 0.00109124 0.5 0.87432 42 8.941e-005 1 0.1433 43 7.16329e-006 1 0.080434 44 4.44047e-007 1 0.022235 45 1.49386e-008 1 0.0026336 46 9.03834e-013 1 1.65e-005

Local minimum found.

Optimization completed because the size of the gradient is less thanthe default value of the function tolerance.

<stopping criteria details>

x =0.9999991240198150.999998284985335

fval =9.038341201889023e-013

exitflag =1

output =iterations: 36funcCount: 46stepsize: 1

firstorderopt: 1.652992217925215e-005algorithm: ’medium-scale: Quasi-Newton line search’

message: [1x438 char]

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

61/63� Recapitulemos resolviendo este problema de optimización con varios métodos

minimizarx2R2

0;5x21.x21=6C 1/C x2 arctan.x2/ � 0;5 ln.x22 C 1/;

� Con Newton:

>> [x f]=Newton_unc(@fun5_2,[2;2])1 1.53333333e+000 -7.67871794e-001 4.74291181e+000 5.00000000e-0012 7.17182434e-001 2.73081655e-001 1.90728360e+000 1.00000000e+0003 1.62394466e-001 -1.33801795e-002 3.16058360e-001 1.00000000e+0004 2.78174708e-003 1.59690475e-006 1.33334500e-002 1.00000000e+0005 1.43502115e-008 -2.71495235e-018 3.86906467e-006 1.00000000e+000

x =1.0e-007 *0.143502114982597

-0.000000000027150f =

1.029642850223918e-016

� Máxima pendiente:

>> [x f]=Maxima_pendiente_unc(@fun5_2,[2;2])1 -3.33333333e-001 1.44642564e+000 4.74291181e+000 5.00000000e-0012 1.23456790e-002 4.80532665e-001 8.89243872e-001 1.00000000e+0003 -6.27225474e-007 3.25798596e-002 1.11454631e-001 1.00000000e+0004 8.22680750e-020 1.15199317e-005 5.30629777e-004 1.00000000e+0005 0.00000000e+000 5.09598738e-016 6.63544598e-011 1.00000000e+000

x =1.0e-015 *

00.509598737990710

f =2.596908737617245e-031

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

62/63

� Con cuasi Newton:

>> [x i f nf]=quasi_newton_1(@fun5_2,[2;2],2)x =

1.0e-005 *-0.4666333930777670.043763046313533

i =10

f =1.098304435204189e-011

nf =20

� Con el método de losgradientes conjugados:

>> [x f] = Grad_Conjugados_unc(@fun5_2,[2;2],2)1 1.53333333e+000 1.88928513e+000 4.74291181e+000 1.00000000e-0012 -1.24817638e-001 1.06078252e+000 2.92446293e+000 1.00000000e+0003 -2.60972425e-002 2.32534554e-001 4.95278202e-001 1.00000000e+0004 -1.98278513e-002 1.70462426e-001 2.71382019e-002 1.00000000e+0005 -1.20583962e-003 1.35873959e-002 1.46557482e-002 1.00000000e+0006 -1.37365260e-003 1.15727268e-002 9.30328488e-005 1.00000000e+0007 2.08186271e-005 2.50443071e-004 6.79059687e-005 1.00000000e+0008 -2.88173500e-005 2.33979887e-004 3.15775731e-008 1.00000000e+0009 1.90328536e-006 6.31282977e-007 2.77885134e-008 1.00000000e+000

10 -4.91063765e-008 3.73003496e-007 2.01048075e-012 1.00000000e+000x =

1.0e-006 *-0.0491063764567620.373003496268343

f =7.072634268876572e-014

h i j

d e f g

a b c

10 8 7

9 4 6 5

1 2 3

63/63� Con el método de región de confianza DogLeg con modelo de Newton:

>> [x f] = Dogleg_UBC_yo(@fun5_2,[2;2])k x_k(1) x_k(2) f(x_k) Delta |dx|/Del ro e_1 e_20 2.0000e+000 2.0000e+000 4.7429e+000 5.00e-0011 1.5135e+000 1.8846e+000 2.8658e+000 1.00e+000 0.5000 1.040 2.8e+000 1.4e+0002 5.8687e-001 1.5086e+000 1.0754e+000 2.00e+000 0.5000 1.233 2.6e+000 1.4e+0003 -4.1104e-001 -2.2464e-001 1.1188e-001 4.00e+000 0.5000 0.781 2.1e+000 1.6e+0004 -3.9606e-002 7.4824e-003 8.1253e-004 4.00e+000 0.1095 1.045 1.0e+000 1.9e+0005 -4.1354e-005 -2.7927e-007 8.5513e-010 4.00e+000 0.0101 1.001 1.0e+000 1.1e-0016 -4.7149e-014 1.4505e-020 1.1115e-027 4.00e+000 0.0000 1.000 1.0e+000 1.5e-019

Óptimo: x’ = -4.7148815071357958e-014 1.4505439221729893e-020.Función: f(x) = 1.1115053813167661e-027.x =

1.0e-013 *-0.47150.0000

f =1.1115e-027

� Con el método de Newton amortiguado:

>> [x f] = Newton_amortiguado(@fun5_2,[2;2])1 1.22222222e+000 1.07737607e+000 1.43392805e+000 3.33333333e-0012 5.74640142e-001 4.41028668e-002 1.75164488e-001 1.11111111e-0013 1.32066691e-001 4.36656594e-003 8.75568971e-003 3.70370370e-0024 6.09495146e-003 1.55898129e-004 1.85864837e-005 1.23456790e-0025 7.44750479e-005 1.90119424e-006 2.77507370e-009 4.11522634e-0036 3.05225879e-007 7.79177968e-009 4.66421303e-014 1.37174211e-0037 4.18117642e-010 1.06736708e-011 8.75251085e-020 4.57247371e-004

x =1.0e-009 *

0.41810.0107

f =8.7525e-020

Optimización - jldelafuenteoconnor.es · h i j d e f g a b c 10 8 7 9 465 12 3 Teorema Condiciones...

Documents

Transcript of Optimización - jldelafuenteoconnor.es · h i j d e f g a b c 10 8 7 9 465 12 3 Teorema Condiciones...