BorisBurkov.net
cover

Wishart, matrix Gamma, Hotelling T-squared, Wilks' Lambda distributions

July 13, 2021 5 min read

In this post I'll briefly cover the multivariate analogues of gamma-distribution-related univariate statistical distributions.

Wishart distribution

Wishart distribution is a multivariate analogue of Chi-square distribution.

Suppose that you have a multivariate normal distribution, but you don’t know its mean and covariance matrix.

Denote X=(x1,x2,...,xp)T\bm{X} = (x_1, x_2, ..., x_p)^T a gaussian vector, where xix_i are pp individual one-dimensional gaussian random variables with 0 expectation, and their covariance matrix Σ\Sigma is unknown.

Similarly to how we tried to infer sample variance from the empirical data in univariate case (and found out that it is chi-square-distributed), we might want to try to infer the covariance matrix from nn sample points of our multivariate normal distribution.

Our observations then are nn i.i.d pp-vector random variables Xi\bm{X_i}.

Then a random matrix, formed by sum of outer products of those random variables, has Wishart distribution with n degrees of freedom and covariance matrix Σ\Sigma M(p×p)=i=1nXiXiTWp(n,Σ)M_{(p \times p)} = \sum \limits_{i=1}^{n} X_i X_i^T \sim W_p(n, \Sigma).

Probability density function of this distribution is:

fξ(M)=12np/2Γp(n2)Σn/2M(np1)/2etrace(Σ1M)2f_{\xi}(M) = \frac{1}{2^{np/2} \Gamma_p(\frac{n}{2}) |\Sigma|^{n/2}} |M|^{(n-p-1)/2} e^{-\frac{\text{trace}(\Sigma^{-1}M)}{2}}, where Γp\Gamma_p is multivariate gamma-function, and M|M| and Σ|\Sigma| are determinants of the respective matrices.

TODO: meaning of inverse matrix, trace, ratio of determinants

TODO: use in Bayesian statistics as conjugate prior

Matrix Gamma distribution

Matrix Gamma distribution to Wishart distribution in multivariate case is what gamma distribution is to Chi-square distribution in univariate case.

Not much to say about it, really, because it is not very useful on its own.

Hotelling T-squared distribution

Hotelling T-squared distribution is often used as a multivariate analogue of t-Student.

Remember, how in case of univariate normal distribution, we considered a random variable XˉμS2\frac{\bar{X} - \mu}{\sqrt{S^2}}, where Xˉ=i=1nXin\bar{X} = \frac{\sum \limits_{i=1}^{n} X_i}{n} is sample mean, μ\mu - true (distribution) mean, and S2=1n1i=1n(XiXˉ)2S^2 = \frac{1}{n-1} \sum \limits_{i=1}^{n} (X_i - \bar{X})^2 is sample variance?

We showed that its square (Xˉμ)2S2\frac{(\bar{X} - \mu)^2}{S^2} has a Snedecor-Fisher F distribution.

In case of multivariate normal distribution we generalize sample variance S2S^2 with a sample covariance matrix Σ^=1n1i=1n(XiXˉ)(XiXˉ)T\hat{\Sigma} = \frac{1}{n-1} \sum \limits_{i=1}^{n} (\bm{X_i} - \bm{\bar{X}}) (\bm{X_i} - \bm{\bar{X}})^T (which approximates the real covariance matrix Σ\Sigma). While the sample variance was chi-squared-distributed, sample covariance matrix is Wishart-distributed (a matrix generalization of the chi-square distribution).

In multivariate case we consider a quadratic form (Xˉμ)TΣ^1(Xˉμ)(\bm{\bar{X}} - \bm{\mu})^T \hat{\Sigma}^{-1} (\bm{\bar{X}} - \bm{\mu}). Note, how this random variable corresponds to (Xˉμ)2S2\frac{(\bar{X} - \mu)^2}{S^2} in univariate case - we replace division by variance with multiplication by inverse sample covariance matrix.

It turns out that, again, sample means vector Xˉ\bm{\bar{X}} is independent from the sample covariance matrix Σ^\hat{\Sigma} (a result, similar to univariate Cochran’s theorem).

It also turns out that, again:

n(Xˉμ)Np(0,Σ)\sqrt{n} (\bm{\bar{X}} - \bm{\mu}) \sim \mathcal{N}_p(\bm{0}, \Sigma), where Np(0,Σ)\mathcal{N}_p(\bm{0}, \Sigma) is a p-variate normal distribution with 0\bm{0} vector of means and Σ\Sigma covariance matrix;

(n1)Σ^Wp(n1,Σ)(n-1) \hat{\Sigma} \sim \mathcal{W}_p(n-1, \Sigma), where Wp(n1,Σ)\mathcal{W}_p(n-1, \Sigma) is a p-dimensional Wishart distribution with n1n-1 degrees of freedom and Σ\Sigma covariance matrix;

nppnn1(Xˉμ)TΣ^1(Xˉμ)Fp,np\frac{n-p}{p} \frac{n}{n-1} (\bm{\bar{X}} - \bm{\mu})^T \hat{\Sigma}^{-1} (\bm{\bar{X}} - \bm{\mu}) \sim F_{p, n-p}, where n - number of sample points, p - dimensionality of the data, Fp,npF_{p, n-p} - Fisher-Snedecor distribution with p and n-p degrees of freedom.

Hotelling T2T^2 distribution is defined as t2=n(Xˉμ)TΣ^1(Xˉμ)Tμ,n12t^2 = n (\bm{\bar{X}} - \bm{\mu})^T \hat{\Sigma}^{-1} (\bm{\bar{X}} - \bm{\mu}) \sim T^2_{\mu, n-1}, so that t2=p(n1)npFp,npt^2 = \frac{p (n-1)}{n-p} F_{p,n-p}.

Wilks’ Lambda distribution

Wilks’ Lambda is another distribution, very similar to Snedecor-Fisher’s F distribution.

As F distribution is a ratio of two chi-square distributions, Wilks’ Lambda distribution is a ratio of determinants of two Wishart-distributed random matrices (note that the sum of two independent Wishart-distributed matrices with mm and nn degrees of freedom and identical covariance matrices is a Wishart-distributed matrix with m+nm+n degrees of freedom):

AWp(Σ,m)\bm{A} \sim \mathcal{W}_p(\Sigma, m), BWp(Σ,n)\bm{B} \sim \mathcal{W}_p(\Sigma, n), both independent,

λ=det(A)det(A+B)Λ(p,m,n)\lambda = \frac{\det(\bm{A})}{\det(\bm{A} + \bm{B})} \sim \Lambda(p,m,n).

References


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram