We shall meet in place where there is no darkness.

0%

The Method of Lagrangian Multipliers and The KKT Conditions

This article simply introduced strategies for finding the stationary points of the objective function subject to one or more equality or inequality constraints.

Consider a standard form of continuous optimization problem,

minxf(x)s.t.    gk(x)0,    k=1,2,,mhl(x)=0,    l=1,2,,p\min\limits_{\bf x} f({\bf x}) \\ {\rm s.t.}\;\;g_k({\bf x})\leq0,\;\;k=1,2,\cdots,m \\ h_l({\bf x})=0,\;\;l=1,2,\cdots,p \\

in which

xnf,gk,hl:n    for    k=1,2,,m    and    l=1,2,,pm,p0{\bf x} \in \Re^{n} \\ f,g_k,h_l:\Re^n \rightarrow \Re\;\;{\rm for}\;\;k=1,2,\cdots,m\;\;{\rm and}\;\;l=1,2,\cdots,p \\ m,p\geq0

And f,gk,hlf,g_k,h_l are all continuous differentable.

We divided the problem into two cases: p=0p = 0 or p0p \neq 0. For the former we introduced The Method of Lagrange Multipliers as the solving strategy, and simply introduced KKT Conditions for the other one when it suits some Regularity Conditions.

Notice that it was easy to treat a maximization problem by negating the objective function, we only use the maximization problem as a general example.

The Method of Lagrange Multipliers

Consider the optimization problem

minxf(x)s.t.    gk(x)=0    for    k=1,2,,m.\min_{\bf x}f({\bf x}) \\ {\rm s.t.}\;\;g_k({\bf x})=0\;\;{\rm for}\;\;k=1,2,\cdots,m.

To transfer the constrained problem into an unconstrained one, we define the Lagrange Function,

L(x,λ)=f(x)+k=1mλkgk(x)L({\bf x, \overrightarrow{\lambda}})=f({\bf x})+\sum_{k=1}^m \lambda_k g_k({\bf x})

In which λk    for    k=1,2,,m\lambda_{k} \in \Re \;\; {\rm for} \;\; k=1,2,\cdots,m. Then we have the necessary conditions for the optimal solution, which is

{xL=0λL=0\left\{\begin{matrix} \bigtriangledown_{\bf x}L={\bf 0} \\ \bigtriangledown_{\bf \overrightarrow{\lambda}}L={\bf 0} \end{matrix}\right.

called Stationary Equation and Constraints separately. Then solving the n+mn+m simultaneous equations, and the solution (x,λ ⁣)({\bf x^*},\overrightarrow{\lambda}\!^*) are stationary points and corresponding coefficients.

NOTE.{\bf N{\scriptsize OTE}}. Cause we could get stationary point through Lagrange Multipliers directly, the minimization/maximization problems are treated in same way.

EXAMPLE.{\bf E{\scriptsize XAMPLE}}. Maximize f(x,y,z)=8xyzf(x,y,z) = 8xyz subject to x2a2+y2b2+z2c2=1\dfrac{x^2}{a^2}+\dfrac{y^2}{b^2}+\dfrac{z^2}{c^2}=1.

SOLUTION.{\bf S{\scriptsize OLUTION}}. Form the Lagrange Function

L(x,y,z,λ)=8xyz+λ(x2a2+y2b2+z2c21)L(x,y,z,\lambda)=8xyz+\lambda(\dfrac{x^2}{a^2}+\dfrac{y^2}{b^2}+\dfrac{z^2}{c^2}-1)

Then calculate gradient of the function LL and set it to 0\bf 0.

L=[(8yz+2λxa2)(8xz+2λyb2)(8xy+2λzc2)(x2a2+y2b2+z2c21)]=0\bigtriangledown L=\begin{bmatrix} (8yz+\dfrac{2\lambda x}{a^2}) & (8xz+\dfrac{2\lambda y}{b^2}) & (8xy+\dfrac{2\lambda z}{c^2}) & (\dfrac{x^2}{a^2}+\dfrac{y^2}{b^2}+\dfrac{z^2}{c^2}-1) \end{bmatrix} = {\bf 0}

We could get the solution

{x=33ay=33bz=33c\left\{\begin{matrix} x=\dfrac{\sqrt{3}}{3}a \\ y=\dfrac{\sqrt{3}}{3}b \\ z=\dfrac{\sqrt{3}}{3}c \\ \end{matrix}\right.

Considering the background of the question, the maximum solution must exist. Now we can get the answer

fmax(x,y,z)=839abcf_{\rm max}(x,y,z)=\dfrac{8\sqrt{3}}{9}abc

KKT Conditions

For convinience, we consider the case without equality constraint but with a single inequality constraint first,

minxf(x)s.t.    g(x)0.\min_{\bf x}f({\bf x}) \\ {\rm s.t.} \;\; g({\bf x}) \leq 0.

Then define the feasible region K={ng(x)0}\mathbb{K}=\left\{ \Re^n\,|\,g({\bf x}) \leq 0 \right\} and assuming that x{\bf x}^* is the best solution under the constraint condition gg. According to whether x{\bf x}^* is on the border of K\mathbb{K} or not, we can divide the problem into two cases and discuss them separately.

CASE  1.    g(x)<0.{\bf C{\scriptsize ASE}\; 1}.\;\; g({\bf x}^*) <0. The best solution is inside K\mathbb{K}. At this time we call x{\bf x}^* as the interior solution. Obviously, at this time, an infinitesimal displacement of the point towards any direction will not against the constraint, so we call that the constraint conditiongg is inactive.

CASE  2.    g(x)=0.{\bf C{\scriptsize ASE}\;2}.\;\;g({\bf x}^*)=0. The best solution is on the border of K\mathbb{K}. At this time we call x{\bf x}^* as the boundary solution. Correspondingly, now we call that the constraint condition gg is active.

Likely, defining the Lagrangian Function

L(x,λ)=f(x)+λg(x)L({\bf x},\lambda)=f({\bf x})+\lambda g({\bf x})

According to gg is active or inactive, the necessary condition for us to get the best solution are different.

${\bf C{\scriptsize ASE};I{\scriptsize NACTIVE}}. $Cause the constraint condition gg has no influence on getting the best solution, we could make λ=0\lambda = 0 directly. Now the task is equivalent to unconstrained optimization, and only f=0\bigtriangledown f={\bf 0} is needed.

CASE  ACTIVE.{\bf C{\scriptsize ASE} \; A{\scriptsize CTIVE}}. Now the constraint condition gg is equivalent to

g(x)=0g({\bf x})=0

Notice that for every points xg{\bf x} \in g, there is g(x)\bigtriangledown g({\bf x}) orthogonal to gg. Likely, it is obvious to find that f(x)\bigtriangledown f({\bf x}^*) is also orthogonal to gg. So, we can easily prove that fspan(g)\bigtriangledown f \in {\rm span}(\bigtriangledown g) at x{\bf x}^*. That is to say, there exists λ\lambda which makes that

xf=λxg\bigtriangledown_{\bf x} f=-\lambda \bigtriangledown_{\bf x} g

It’s easy to find that λ0\lambda \geq 0 should be kept, cause we want to minimize ff, and f(x)\bigtriangledown f({\bf x}^*) (pointing to the fastest growing direction) should point to the interior of K\mathbb{K}. However, g\bigtriangledown g points to the outside of K\mathbb{K}, so λ\lambda should be kept not less than00, which is called dual feasibility. Likely, if we want to maximize ff, we should keep λ0\lambda \leq 0.

Obviously, there will always be either λ\lambda or gg equal to 00, so it always holds that λg(x)=0\lambda g({\bf x})=0, which is called complementary slackness.

Thus we can summarize all necessary conditions mentioned above as KKT Conditions,

xf+λxg=0g(x)0λ0λg(x)=0\begin{aligned} \bigtriangledown_{\bf x} f + \lambda \bigtriangledown_{\bf x} g &= {\bf 0} \\ {\bf g}({\bf x}) &\leq 0 \\ \lambda &\geq 0 \\ \lambda g({\bf x}) &= 0 \end{aligned}

Similarly, we can also extend the conclusion to the general continuous optimization problem. The corresponding Lagrangian Function is defined as

L(x,λ,μ)=f(x)+λ ⁣g(x)+μ ⁣h(x)L({\bf x}, \overrightarrow \lambda, \overrightarrow \mu)=f({\bf x})+\overrightarrow \lambda\!^\top {\bf g}({\bf x}) + \overrightarrow \mu\!^\top {\bf h}({\bf x})

And it’s also convinient to write down the corresponding KKT Conditions

xf+λ ⁣xg+μ ⁣xh=0gk(x)0,    k=1,2,,mhl(x)=0,    l=1,2,,pλk0,λkgk(x)=0\begin{aligned} \bigtriangledown_{\bf x} f + \overrightarrow \lambda\!^\top \bigtriangledown_{\bf x}{\bf g}+\overrightarrow \mu\!^\top\bigtriangledown_{\bf x}{\bf h} &= {\bf 0} \\ g_k({\bf x}) &\leq 0, \;\; k=1,2,\cdots, m \\ h_l({\bf x}) &= 0, \;\; l=1,2,\cdots, p \\ \lambda_k &\geq 0,\\ \lambda_k g_k({\bf x}) &= 0 \end{aligned}

NOTE.{\bf N{\scriptsize OTE}}. In order for existing a point x{\bf x}^* fitting the KKT Conditions, The primal question should satisfy some regular conditions, which has been listed on Wikipedia.

EXAMPLE.{\bf E{\scriptsize XAMPLE}}. Minimize x12+x22x_1^2 + x_2^2 subject to x1+x2=1x_1 + x_2=1 and x2αx_2 \leq \alpha, in which x1,x2,αx_1, x_2, \alpha \in \Re.

SOLUTION.{\bf S{\scriptsize OLUTION}}. The corresponding Langrangian Function is

L(x,λ,μ)=x12+x22+λ(x2α)+μ(x1+x21)L({\bf x}, \lambda, \mu)=x_1^2 + x_2^2 +\lambda(x_2-\alpha)+\mu(x_1 + x_2 - 1)

According to KKT Condition, there must be

{Lx1=Lx2=0x1+x2=1x2αλ0λ(x2α)=0\left \{ \begin{aligned} \frac{\partial L}{\partial x_1} = \frac{\partial L}{\partial x_2} &= 0 \\ x_1 + x_2 &= 1 \\ x_2 &\leq \alpha \\ \lambda &\geq 0 \\ \lambda(x_2 - \alpha) &= 0 \end{aligned} \right.

which is equivalent to

{x1=μ2x2=μ2+1μ1μ2α2\left \{ \begin{aligned} x_1 &= -\frac{\mu}{2} \\ x_2 &= \frac{\mu}{2} + 1 \\ \mu &\leq -1 \\ \mu &\leq 2\alpha - 2 \end{aligned} \right.

Now we can divide the problem into 2 cases according to whether 2α212\alpha - 2 \geq -1 or not.

CASE    α12.{\bf C{\scriptsize ASE}}\;\;\alpha \geq \dfrac{1}{2}. It is easy to verify that μ=1\mu = -1 satisfies all KKT Conditions above, so when x1=x2=12x_1 = x_2 = \dfrac{1}{2}, x12+x22x_1^2 + x_2^2 takes minimum value 12\dfrac{1}{2}.

CASE    α<12.{\bf C{\scriptsize ASE}}\;\;\alpha < \dfrac{1}{2}. There is μ=2α2\mu=2\alpha - 2 satisfies all KKT Conditions above only, so when x1=1αx_1=1-\alphaandx2=αx_2 = \alpha, x12+x22x_1^2 + x_2^2 takes minimum value 12α+2α21 - 2\alpha + 2\alpha^2.

  • 本文作者: Panelatta
  • 本文链接: https://bofc.tech/6c9565dd/
  • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!