Cochran's theorem
June 30, 2021 18 min read
Here I discuss the Cochran's theorem that is used to prove independence of quadratic forms of random variables, such as sample variance and sample mean.
Tearing through the unintelligible formulation
Cochran’s theorem is a field-specific result, sometimes used in the analysis of chi-squared and multivariate normal distributions.
Probably, the main two reasons to be concerned about it are the proof of independence between the sample variance and sample mean in derivation of t-Student distribution and ANOVA, where total variance can be split into multiple sources of variance, which are usually expressed as quadratic forms.
As an example of the last case (maybe not them most relevant one), consider the famous bias-variance tradeoff, where total expected error of the regression model is divided into a sum of two sources of variance, squared bias (the systematic error due to crude model being unable to fit the more complex nature of data) and variance (the error created by the fact that the amount of data available is limited, and is somewhat insufficient for the model to learn to fit the data perfectly).
Anyway, the formulation of the theorem sounds really technical, and is hard to digest. Basically, it says that if you had a sum of squares of independent identically distributed normal variables and managed to split it into a several non-full-rank quadratic forms where sum of their ranks equals , each of those quadratic forms is an independent random variable, distributed as , a chi-square with degrees of freedom, where, where is the rank of corresponding quadratic form.
For instance, if we’ve split our sum of square of i.i.d. normal variables with 0 mean (so that ) into 2 quadratic forms with the matrix having rank and having rank , , and and are independent.
See the examples below to see, why this result is valuable.
Cochran’s theorem proof
Here I’ll outline the proof of Cochran’s theorem.
The central piece of the proof is the following lemma, which is a result from pure linear algebra, not probabilities - it deals with matrices and real numbers, not random variables. When we are done with the lemma, the proof of the theorem itself gets pretty straightforward.
Lemma
Suppose that we have an n-dimensional vector and a quadratic form .
Suppose that we found a way to split this quadratic form into several: , where the matrices have lower ranks , so that the sum of those ranks equals n: .
Then all those matrices can be simultaneously diagonalized. There exists an orthogonal matrix of their joint eigenvectors, and after diagonalising, we get:
, where are such transforms of vector, that in the resulting quadratic forms matrices are diagonal.
As matrices contain only non-zero eigenvalues: , where starts with and ends with , in each expression only -th coordinates of actually matter.
Moreover, importantly all the eigenvalues equal to 1, so each quadratic form quadratic forms actually end up being just a sum of squares of i.i.d normal variables , which means it is chi-square-distributed.
Preparations for the lemma: eigenvalues and eigenvectors of lower-rank matrices
As you see the statement of the lemma deals with lower-rank matrices and their eigen decomposition. We need to learn how to work with them in order to understand the proof of the lemma.
For starters, let us first consider a rank 1 matrix, e.g.
It has to have 2 eigenvalues.
We can represent it as an outer product of two vectors, and : .
Then one of its eigenvalues is because , and u is the eigenvector.
As for the rest of eigenvalues, they equal to zero, and the corresponding eigenvectors have to cover the remaining space of dimensionality (n-1). For instance, it is clear that (2, -1) is an eigenvector with 0 eigenvalue.
Obviously, in case of matrix of rank 1, we have just one linearly independent equation, that makes all but one coordinates of eigenvectors, corresponding to eigenvalue 0, arbitrary, and the last one is determined by that row.
Now, suppose that you have dimension of your matrix and rank of your matrix , for instance: .
Then 2 eigenvectors are fixed and correspond to non-zero eigenvalues and , while a 2-dimensional subspace is left, corresponding to the eigenvalues . Indeed, if you try solving with or , you can clearly see, that you’ll have just 2 constraints on solutions: and , which allows for eigenspace of dimensionality 2. You can choose arbitrary values for and , then and .
Preparations for the lemma: simultaneous diagonalization
Sometimes two lower-rank matrices can be simultaneously diagonalized.
Example: say, you have 2 matrices, , .
For matrix eigenvectors are with eigenvalue , with eigenvalue ; , and corresponding eigenspace is , where - arbitrary values.
For matrix eigenvectors are with eigenvalue , with eigenvalue ; , and corresponding eigenspace is , where - arbitrary values.
Thus, we can simultaneously diagonalize these two matrices, because eigenvalues of matrix are compatible with the eigenspace of and vice versa.
Now, we won’t be using the following results below, but I’ll still mention them. Simultaneous diagonalization of lower-rank matrices in general is NOT always possible. However, there is always a way for two symmetric, positive-definite matrices of the same size - Cholesky decomposition. This result is well-known, because it has important practical applications for Lagrangian and Quantum mechanics. In quantum mechanics two operators can be observed simultaneously, if they commute (i.e. their eigen functions are the same, or they are simultaneously diagonalizable).
Proof of the lemma
Let us start with 2-dimensional case. We start with:
Do an eigen decomposition of : or .
Note that as the rank of matrix is not full, the matrix E_1 of eigenvectors has en eigenspace, corresponding to 0 eigenvalue, where we can choose an arbitrary basis. Let us do this in such a way that the resulting matrix E_1 is full-rank orthogonal (this is possible because is symmetric).
Now, denote and recall that .
or, equivalently,
Now, if (for instance, 2), we can re-arrange this as follows:
Rearrange the terms to get:
Now that we know that the , we can come to the conclusion that .
Indeed, recall that . So, (see wiki).
As a result we have:
There is only one way for the matrix to have rank - all the eigenvalues should equal to 1: .
Now, what if n > 2, e.g. n=3? The key observation for this case is the fact that rank is subadditive: . So we can be sure that is a matrix of rank no greater than . Hence, we can disregard the first from to and apply the same argument again for the remaining .
Theorem proof
Now that we have proven the lemma, which constitutes the core of Cochran’s theorem, we can apply it to our random variables.
By analogy to the lemma, we apply an orthogonal transform to our random variable , so that our sum of quadratic forms takes the following form:
…
Let us show now that all random variables are independent. Recall that covariance matrix of is , since all the
Now, if , where is orthogonal matrix, the covariance matrix of Y is an outer product:
.
So, all are independent identically distributed random variables.
Since every occurs in exactly one and the ’s are all independent random variables (because is an orthogonal matrix), Cochran’s theorem follows.
Example: Application to ANOVA
Total sum of squares (SSTO) can be split into two terms: .
Thus, .
Now, both terms of the sum can be represented in matrix notation as quadratic forms:
,
,
Application to sample mean and sample variance
TODO; see wikipedia
References
- https://en.wikipedia.org/wiki/Cochran%27s_theorem
- https://brilliant.org/wiki/multivariate-normal-distribution/
- http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/lecture_cochran.pdf
- https://www.youtube.com/watch?v=toNiUsay5uU
- https://math.stackexchange.com/questions/55165/eigenvalues-of-the-rank-one-matrix-uvt
- https://en.wikipedia.org/wiki/Definite_matrix#Simultaneous_diagonalization
- https://en.wikipedia.org/wiki/Diagonalizable_matrix#Simultaneous_diagonalization
- https://www.math.purdue.edu/~eremenko/dvi/simult.pdf
- https://en.wikipedia.org/wiki/Rank_(linear_algebra)#Properties
Written by Boris Burkov who lives in Moscow, Russia, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram