These four constrained optimization methods looks similarly when first seen:

- Lagrange Multipliers
- Penalty Methods
- Augmented Lagrangian Methods
- Merit Methods

Here is a comprehensive explaination towards these four methods written by Brian Borchers.

No, they’re not all the same and it’s important to understand the differences between them.

Start with a simple optimization problem

${\rm min}\,f({\bf x})$

subject to

${\bf g}({\bf x})=0$

where we can assume for simplicity that $f$ and $g$ are smooth (at least twice continously differentiable.)

The Lagrangian function is

$L({\bf x},\boldsymbol{\lambda})=f({\bf x})+\boldsymbol{\lambda}^\top {\bf g}({\bf x})$

Note that $L$ is a function of ${\bf x}$ and $\boldsymbol{\lambda}$. The first order necessary condition for a point ${\bf x}^*$ to be a minimizer is that there is a $\boldsymbol{\lambda}^*$ such that $({\bf x}^*,\boldsymbol{\lambda}^*)$ is a stationary point of $L$. In the method of multipliers, we try to solve the nonlinear system of equations

$\nabla_{\bf x}L={\bf 0} \\\nabla_{\boldsymbol{\lambda}}L={\bf 0}$

This is typically done by alternately minimizing with respect to $\bf x$ and updating $\boldsymbol{\lambda}$. Given a Lagrange multiplier estimate $\boldsymbol{\lambda}^{(k)}$, we minimize $L({\bf x},\boldsymbol{\lambda}^{(k)})$ to get ${\bf x}^{(k)}$. Then we update $\boldsymbol{\lambda}$ with

$\boldsymbol\lambda^{(k+1)}=\boldsymbol\lambda^{(k)}+\alpha_k {\bf g}({\bf x}^k)$

Where $\alpha_k$ is a step size parameter that can be set in various ways.

An penalty function for our problem is a function that is $0$ if ${\bf g}({\bf x})={\bf 0}$ and greater than $\bf 0$ when ${\bf g}({\bf x}) \neq \bf 0$. A commonly used penalty function is the quadratic penalty function

$\phi({\bf g}({\bf x}))={\bf g}^2 ({\bf x})$

In the penalty function method, we solve an unconstrained problem of the form

${\rm min}\, f({\bf x})+\boldsymbol\rho^\top\phi({\bf g}({\bf x}))$

where $\boldsymbol \rho$ is a penalty parameter that is increased until the solution of the penalized problem is close to satisfying ${\bf g}({\bf x})={\bf 0}$. Note that $\boldsymbol \rho$ is not a Lagrange multiplier in this case.

For problems with inequality constraints a commonly used penalty function is

$\phi({\bf g}({\bf x})) = \left({\rm max} \left\{ 0, {\bf g}({\bf x}) \right\}\right)^2$

An augmented Lagrangian function combines the penalty function idea with the Lagrangian:

$\hat L({\bf x},\boldsymbol\lambda, \boldsymbol\rho)=f({\bf x})+\boldsymbol\lambda^\top{\bf g}({\bf x})+\boldsymbol\rho^\top\phi({\bf g}({\bf x}))$

Augmented Lagrangian methods minimize $\hat L$ with respect to $\bf x$, update the Lagrange multiplier estimate $\boldsymbol \lambda$ and then (if necessary) update the penalty parameter $\boldsymbol\rho$ in each iteration. In practice, augmented Lagrangian methods outperform simple penalty methods and the method of multipliers.

Merit functions are used in a variety of nonlinear programming algorithms. You’ll most commonly see them used in sequential quadratic programming methods. In these methods, a search direction, ${\bf d}^{(k)}$, is computed at each iteration. The step is from ${\bf x}^{(k)}$ to

${\bf x}^{(k+1)}={\bf x}^{(k)}+\alpha_k {\bf d}^{(k)}$

where the step size parameter $\alpha_k$ is determined by minimizing a merit function

$\alpha_k = {\rm arg}\min_\alpha M({\bf x}^{(k)}+\alpha{\bf d}^{(k)})$

The merit function is typically something like a penalized objective function or an augmented Lagrangian, but there’s a great deal of freedom in the form of the merit function.

These functions and the associated methods are described in many textbooks on nonlinear optimization. A good discussion can be found in Numerical Optimization by Nocedal and Wright.