BorisBurkov.net
cover

Quadratic programming

December 06, 2021 6 min read

Solution of quadratic programming problem in polynomial time by Soviet mathematicians in the late 1970s - early 1980s paved the way for several important machine learning methods of 1990s, such as Lasso/ElasticNet regression (with its L1 regularization) and Support Vector Machines (SVM). These methods are incredibly useful, because they produce sparse solutions, effectively serving as a proxy for L0 regularization, in feature space/data point space respectively. Thanks to QP magic, they manage to do this in polynomial time, while straightforward application of L0 norm is NP-hard and cannot be done efficiently.

Problem statement

General case of quadratic programming problem is stated as follows.

Suppose, that you have a vector of multiple variables x=(x1x2...xn)\bf{x} = \begin{pmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{pmatrix}. given by a quadratic form + a linear bias

You have a function of those variables, which is a sum of quadratic term and linear term f(x)=12xTQxxTbf(x) = \frac{1}{2} {\bf x}^T Q {\bf x} - {\bf x}^T \bf{b}.

Here quadratic term is given by a quadratic form Q=(q1,1q1,2...q1,nq2,1q2,2...q2,n.........qn,1qn,2...qn,n)Q = \begin{pmatrix} q_{1,1} && q_{1,2} && ... && q_{1,n} \\ q_{2,1} && q_{2,2} && ... && q_{2,n} \\ ... && ... && \ddots && ... \\ q_{n,1} && q_{n,2} && ... && q_{n,n} \end{pmatrix}.

The linear term is given by a dot product of x\bf{x} with bias weights vector: b=(b1b2...bn)\bf{b} = \begin{pmatrix} b_1 \\ b_2 \\ ... \\ b_n \end{pmatrix}.

We also have a set of mm constraints in the form of linear inequalities, that delimit the possible solution to a convex subset of your input subspace:

AxcA \bf{x} \le \bf{c}, where A=(a1,1a1,2...a1,n...am,1am,2...am,n)A = \begin{pmatrix} a_{1,1} && a_{1,2} && ... && a_{1,n} \\ ... \\ a_{m,1} && a_{m,2} && ... && a_{m,n} \end{pmatrix} and c=(c1c2...cm)\bf{c} = \begin{pmatrix} c_1 \\ c_2 \\ ... \\ c_m \end{pmatrix}

These constraints are called Karush-Kuhn-Tucker.

Solution in case of strict equality

In case Karush-Kuhn-Tucker conditions are strict equalities, not inequalities, the solution becomes fairly straightforward.

We shall just apply Lagrange multiplier method and find the conditional minimum of our system on variables xix_i and λi\lambda_i, where Λ=(λ1λ2...λm)\Lambda = \begin{pmatrix} \lambda_1 \\ \lambda_2 \\ ... \\ \lambda_m \end{pmatrix} are Lagrange multipliers.

Just add equations λiAix=λici\lambda_i A_i \bf{x}= \lambda_i c_i to f(x)=12xTQx+xTbf(\bf{x}) = \frac{1}{2} \bf{x}^T Q \bf{x} + \bf{x}^T \bf{b} and take the partial derivatives on xix_i, equating them to zero:

(12(xiqi,1x1+xiqi,2x2+...+xiqi,nxn)(b1x1+...bnxn)+λ1(a1,1x1+...+a1,nxn)+...+λm(am,1x1+...+am,nxn))xi=0    \frac{\partial (\frac{1}{2}(x_i q_{i,1} x_1 + x_i q_{i,2} x_2 + ... + x_i q_{i,n} x_n) - (b_1 x_1 + ... b_n x_n) + \lambda_1 (a_{1,1} x_1 + ... + a_{1,n} x_n) + ... + \lambda_m (a_{m,1} x_1 + ... + a_{m,n} x_n))}{\partial x_i} = 0 \implies

    qi,1x1+...+qi,nxnbi+λ1a1,i+λ2a2,i+...+λmam,i=0 \implies q_{i,1} x_1 + ... + q_{i,n} x_n - b_i + \lambda_1 a_{1,i} + \lambda_2 a_{2,i} + ... + \lambda_m a_{m,i} = 0

and also we have the equations ai,1x1+ai,2x2+...+ai,nxn=cia_{i,1} x_1 + a_{i,2} x_2 + ... + a_{i,n} x_n = c_i themselves.

Thus, solution takes form of a system of n+mn+m linear equations with n+mn+m variables (x\bf{x} and λ\bf{\lambda}):

(q1,1...q1,na1,1...am,1...qn,1...qn,na1,n...am,na1,1...a1,n0...0...am,1...am,n0...0)(x1...xnλ1....λm)=(b1...bnc1...cm)\begin{pmatrix} q_{1,1} && ... && q_{1,n} && a_{1,1} && ... && && a_{m,1} \\ ... \\ q_{n,1} && ... && q_{n,n} && a_{1,n} && ... && && a_{m,n} \\ a_{1,1} && ... && a_{1,n} && 0 && ... && 0 \\ ... \\ a_{m,1} && ... && a_{m,n} && 0 && ... && 0 \end{pmatrix} \begin{pmatrix} x_1 \\ ... \\ x_n \\ \lambda_1 \\ .... \\ \lambda_m \end{pmatrix} = \begin{pmatrix} b_1 \\ ... \\ b_n \\ c_1 \\ ... \\ c_m \end{pmatrix}

or, in short notation:

(QATA0)(xλ)=(bc)\begin{pmatrix} Q && A^T \\ A && 0 \end{pmatrix} \begin{pmatrix} \bf{x} \\ \bf{\lambda} \end{pmatrix} = \begin{pmatrix} \bf{b} \\ \bf{c} \end{pmatrix}

Thus, solution is straightforward.

References


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram