Initial Step Length Derivation In Nocedal & Wright

by ADMIN 51 views

Introduction

In the field of optimization, particularly in numerical optimization, the choice of step length is a crucial aspect of the optimization process. The step length determines how far the algorithm moves in the direction of the negative gradient, and it plays a significant role in the convergence of the algorithm. In the book "Numerical Optimization" by Jorge Nocedal and Stephen J. Wright, the authors discuss various strategies for choosing the step length. In this article, we will focus on the initial step length derivation presented in the book.

Background

Optimization algorithms, such as gradient descent, rely on the concept of a step length to move from one point to another in the search space. The step length is typically chosen to minimize the objective function, and it is often determined using a line search algorithm. The line search algorithm searches for the optimal step length by evaluating the objective function at multiple points along the search direction.

Interpolating a Quadratic

On page 59 of the book, Nocedal and Wright present an alternative strategy for choosing the step length. The strategy involves interpolating a quadratic function to the data points f(xk),f(xkβˆ’1),βˆ‡fkβˆ’1Tpkβˆ’1f(x_k), f(x_{k-1}), \nabla f^T_{k-1}p_{k-1} and defining the initial step length Ξ±0Ξ±_0 to be the minimizer of the quadratic function. This strategy is based on the idea that the quadratic function can be used to approximate the objective function in the neighborhood of the current point.

Derivation of the Initial Step Length

To derive the initial step length, we start by assuming that the quadratic function can be written in the form:

f(x)=f(xk)+βˆ‡fkβˆ’1T(xβˆ’xkβˆ’1)+12(xβˆ’xkβˆ’1)THkβˆ’1(xβˆ’xkβˆ’1)f(x) = f(x_k) + \nabla f^T_{k-1}(x - x_{k-1}) + \frac{1}{2}(x - x_{k-1})^T H_{k-1}(x - x_{k-1})

where Hkβˆ’1H_{k-1} is the Hessian matrix of the objective function at the point xkβˆ’1x_{k-1}.

We can then use the data points f(xk),f(xkβˆ’1),βˆ‡fkβˆ’1Tpkβˆ’1f(x_k), f(x_{k-1}), \nabla f^T_{k-1}p_{k-1} to define the quadratic function as:

f(x)=f(xk)+βˆ‡fkβˆ’1T(xβˆ’xkβˆ’1)+12(xβˆ’xkβˆ’1)THkβˆ’1(xβˆ’xkβˆ’1)f(x) = f(x_k) + \nabla f^T_{k-1}(x - x_{k-1}) + \frac{1}{2}(x - x_{k-1})^T H_{k-1}(x - x_{k-1})

where Hkβˆ’1H_{k-1} is the Hessian matrix of the objective function at the point xkβˆ’1x_{k-1}.

To find the minimizer of the quadratic function, we can take the derivative of the function with respect to xx and set it equal to zero:

βˆ‡f(x)=βˆ‡fkβˆ’1T+Hkβˆ’1(xβˆ’xkβˆ’1)=0\nabla f(x) = \nabla f^T_{k-1} + H_{k-1}(x - x_{k-1}) = 0

Solving for xx, we get:

x=xkβˆ’1βˆ’Hkβˆ’1βˆ’1βˆ‡fkβˆ’1Tx = x_{k-1} - H_{k-1}^{-1} \nabla f^T_{k-1}

The minimizer of the quadratic function is then given by:

Ξ±0=xβˆ’xkβˆ’1=βˆ’Hkβˆ’1βˆ’1βˆ‡fkβˆ’1TΞ±_0 = x - x_{k-1} = - H_{k-1}^{-1} \nabla f^T_{k-1}

This is the initial step length derived by Nocedal and Wright.

Advantages of the Initial Step Length Derivation

The initial step length derivation presented by Nocedal and Wright has several advantages. Firstly, it provides a simple and efficient way to choose the step length, which is essential for the convergence of the optimization algorithm. Secondly, it is based on the quadratic function, which is a good approximation of the objective function in the neighborhood of the current point. Finally, it does not require the computation of the Hessian matrix, which can be computationally expensive.

Conclusion

In conclusion, the initial step length derivation presented by Nocedal and Wright provides a simple and efficient way to choose the step length in optimization algorithms. The derivation is based on the quadratic function, which is a good approximation of the objective function in the neighborhood of the current point. The advantages of the derivation include its simplicity, efficiency, and the fact that it does not require the computation of the Hessian matrix.

Future Work

Future work in this area could involve exploring other strategies for choosing the step length, such as using machine learning techniques or incorporating additional information from the problem. Additionally, the derivation could be extended to more complex optimization problems, such as those involving multiple objectives or constraints.

References

  • Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer Science & Business Media.
  • Powell, M. J. D. (1970). A new algorithm for unconstrained optimization. In J. R. Rice (Ed.), Nonlinear programming (pp. 31-65). Academic Press.

Code

The code for the initial step length derivation is provided below:

import numpy as np

def initial_step_length(f, x_k, x_k_minus_1, grad_f_k_minus_1): """ Compute the initial step length using the quadratic function.

Parameters:
f (function): The objective function.
x_k (numpy array): The current point.
x_k_minus_1 (numpy array): The previous point.
grad_f_k_minus_1 (numpy array): The gradient of the objective function at the previous point.

Returns:
alpha_0 (float): The initial step length.
"""
# Compute the Hessian matrix
H_k_minus_1 = np.dot(grad_f_k_minus_1, grad_f_k_minus_1.T)

# Compute the minimizer of the quadratic function
alpha_0 = -np.dot(np.linalg.inv(H_k_minus_1), grad_f_k_minus_1)

return alpha_0

This code computes the initial step length using the quadratic function and the Hessian matrix. The function takes as input the objective function, the current point, the previous point, and the gradient of the objective function at the previous point. The function returns the initial step length.