
Intro to the Extreme Value Theory and Extreme Value Distribution
April 30, 2023 167 min read
Quite often in mathematical statistics I run into Extreme Value Distribution - an analogue of Central Limit Theorem, which describes the distribution of maximum/minimum, observed in a series of i.i.d random variable tosses. This is an introductory text with the basic concepts and proofs of results from extreme value theory, such as Generalized Extreme Value and Pareto distributions, Fisher-Tippett-Gnedenko theorem, von Mises conditions, Pickands-Balkema-de Haan theorem and their applications.
Contents:
- Problem statement and Generalized Extreme Value distribution
- Type I: Gumbel distribution
- Type II: Frechet distribution
- Type III: Inverse Weibull distribution
- Fisher-Tippett-Gnedenko theorem
- Examples of convergence
- General approach: max-stable distributions as invariants/fixed points/attractors and EVD types as equivalence classes
- Khinchin’s theorem (Law of Convergence of Types)
- Necessary conditions of maximium stability
- Fisher-Tippett-Gnedenko theorem (Extreme Value Theorem)
- Distributions not in domains of attraction of any maximum-stable distributions
- Von Mises sufficient conditions for a distribution to belong to a type I, II or III
- Pre-requisites from survival analysis
- Von Mises conditions proof
- Generalizations of von Mises condition for Type I EVD: auxiliary function and von Mises function
- Necessary and sufficient conditions for a distribution to belong to a type I, II or III
- Pre-requisites from Karamata’s theory of slow/regular/Г-/П- variation
- Necessary and sufficient conditions of convergence to Types II or III EVD
- Necessary and sufficient conditions of convergence to Type I EVD
- Residual life time
- Generalized Pareto distribution
- Residual life time problem
- Pickands-Balkema-de Haan theorem (a.k.a. Second Extreme Value Theorem)
- Order statistics and parameter estimation
- Order statistics
- Hill’s estimator
- Pickands’ estimator
- Other estimators
- Summary and examples of practical application
- Examples of Type I Gumbel distribution
- Examples of Type II Frechet distribution
- Examples of Type III Inverse Weibull distribution
- Concluding remarks
1. Problem statement and Generalized Extreme Value distribution
One of the most famous results in probabilities is Central Limit Theorem, which claims that sum of i.i.d. random variables after centering and normalizing converges to Gaussian distribution.
Now, what if we ask a similar question about maximum of those i.i.d. random variables instead of sum? Does it converge to any distribution?
Turns out that it depends on the properties of the distribution , but not much really. Regardless of the distribution of the distribution of maximum of random variables is:
This distribution is called Generalized Extreme Value Distribution. Depending on the coefficient it can take one of three specific forms:
Type I: Gumbel distribution
If , we can assume that . Then generalized EVD converges to a doubly-exponential distribution (sometimes this is called a law of double logarithm) by definition of and :
.
This is Gumbel distribution, it oftentimes occurs in various areas, e.g. bioinformatics, describing the distribution of longest series of successes in coin tosses in experiments of tossing a coin 100 times.
It is often parametrized by scale and center parameters. I will keep it centered here, but will add shape parameter :
, or, in a more intuitive notation .
It is straightforward to derive probability density function from here:
.
import math
import numpy as np
import matplotlib.pyplot as plt
scale = 1
# Generate x values from 0.1 to 20 with a step size of 0.1
x = np.arange(-20, 20, 0.1)
# Calculate y values
gumbel_cdf = math.e**(-math.e**(-(x/scale)))
gumbel_pdf = (1 / scale) * np.exp(-( x/scale + math.e**(-(x / scale))))
# Create the figure and axis objects
fig, ax = plt.subplots(figsize=(12,8), dpi=100)
# Plot cdf
ax.plot(x, gumbel_cdf, label='cdf')
# Plot pdf
ax.plot(x, gumbel_pdf, label='pdf')
# Set up the legend
ax.legend()
# Set up the labels and title
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Plot of Gumbel pdf and cdf')
# Display the plot
plt.show()
Type II: Frechet distribution
If , let us denote (k > 0), , where is called shape parameter and - scale parameter. Then distribution takes the shape:
.
To make it more intuitive, I’ll re-write cdf in the following way: .
This is Frechet distribution. It arises when the tails of the original cumulative distribution function are heavy, e.g. when it is Pareto distribution.
Let us derive the probability density function for it:
.
Here is the plot:
import math
import numpy as np
import matplotlib.pyplot as plt
shape = 2 # alpha
scale = 2 # beta
# Generate x values from 0.1 to 20 with a step size of 0.1
x = np.arange(0, 20, 0.1)
# Calculate y values
frechet_cdf = math.e**(-(scale / x) ** shape)
frechet_pdf = (shape / scale) * ((scale / x) ** (shape + 1)) * np.exp(-((scale / x) ** shape))
# Create the figure and axis objects
fig, ax = plt.subplots(figsize=(12,8), dpi=100)
# Plot cdf
ax.plot(x, frechet_cdf, label='cdf')
# Plot pdf
ax.plot(x, frechet_pdf, label='pdf')
# Set up the legend
ax.legend()
# Set up the labels and title
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Plot of Frechet distribution pdf and cdf')
# Display the plot
plt.show()
Type III: Inverse Weibull distribution
If , let us denote (k > 0, different kinds of behaviour are observed at , and ), .
Then distribution takes the shape:
.
.
This is Inverse Weibull distribution. Its direct counterpart (Weibull distribution) often occurs in survival analysis as a hazard rate function. It also arises in mining - there it describes the mass distribution of particles of size and is closely connected to Pareto distribution. We shall discuss this connection later.
Generalized extreme value distribution converges to Inverse Weibull, when distribution of our random variable is bounded. E.g. consider uniform distribution . It is clear that the maximum of uniformly distributed variables will be approaching 1 as . Turns out that the convergence rate is described by Inverse Weibull distribution.
To make it more intuitive, we can re-write the cdf as .
Derive from cumulative distribution function the probability density function:
.
Let us draw the plot:
import math
import numpy as np
import matplotlib.pyplot as plt
shape = 2 # alpha
scale = 2 # beta
# Generate x values from 0.1 to 20 with a step size of 0.1
x = np.arange(-20, 0, 0.1)
# Calculate y values
inverse_weibull_cdf = math.e**(-(-x/scale) ** shape)
inverse_weibull_pdf = (shape / scale) * ((-x / scale) ** (shape - 1)) * np.exp(-((-x / scale) ** shape))
# Create the figure and axis objects
fig, ax = plt.subplots(figsize=(12,8), dpi=100)
# Plot cdf
ax.plot(x, inverse_weibull_cdf, label='cdf')
# Plot pdf
ax.plot(x, inverse_weibull_pdf, label='pdf')
# Set up the legend
ax.legend()
# Set up the labels and title
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Plot of Inverse Weibull pdf and cdf')
# Display the plot
plt.show()
2. Fisher-Tippett-Gnedenko theorem
Extreme Value Theorem is a series of theorems, proven in the first half of 20-th century. They claim that maximum of several tosses of i.i.d. random variables converges to just one of 3 possible distributions, Gumbel, Frechet or Weibull.
Here I will lay out the outline of the proof with my comments. The proof includes introduction of several technical tools, but I will comment on their function and rationale behind each of them.
Consider a random variable , which describes the distribution of maximum of ,
.
Similarly to the Central Limit Theorem, a convergence theorem might be applicable to the distribution of a normalized random variable rather than the non-normalized:
We aim to show that for some series of constants and
as converges in distribution to some distribution : .
Now I will provide several examples and then informally describe the proof outline, before introducing the mathematical formalism.
Examples of convergence
Let us start with simple examples of convergence to types I and II to get a taste of how convergence to maxima works.
Example 2.1. Convergence of maximum of i.i.d. exponential random variables to Gumbel distribution
I’ll never believe that a horse and a wagon age in the same way.
- Alex Comfort (as quoted by Vladimir P. Skulachev)
Consider a wagon with an exponentially-distributed life time. For instance, assume that the probability for a wagon to break down every year is .
Then let random variable be its lifetime and its cdf be:
Given wagons, what’s the distribution of lifetime of the longest living one?
We see that this is almost standard Gumbel distribution (type I EVD).
In order to get the stadard one, we need to consider a shifted random variable by choosing , , so that we get:
.
Example 2.2. Convergence of maximum of i.i.d. Pareto random variables to Frechet distribution
Consider another example: Pareto distribution. It describes the distribution of wealth (e.g. fraction of people, having net worth above ), sizes of cities (e.g. fraction of cities with a population greater than ), sizes of orders on an exchange (e.g. fraction of orders of size greater than ), number of Telegram channels with more than subsribers. Here is its cdf:
It is often convenient to depict it in log-log coordinates because :
Pareto distribtuion also describes the distribution of life time of memes, comedy shows etc. This phenomenon is known as Lindy effect. With a little calculation one can show that expected lifetime of a comedy show equals to the time it’s already been around, which leads to Pareto distribution of its cdf.
Again, let us consider the distribution of maximum lifetime of the comedy shows then:
We see that it converges to Frechet (EVD type II) distribution.
Example 2.3. Convergence of minimum/maximum of i.i.d. Weibull/inverse Weibull random variables to Weibull/inverse Weibull distribution
Consider a single link of a chain, the strength of which is described by a Weibull distribution :
In this specific application of Weibull distribution we may say that the strength of a link is the probability that it does not break under application of force (sorry for the duplication of symbol in the notation, used as both cdf and force).
Then we can show that strength of the whole chain is also described by Weibull distribution, as the chain is as strong as its weakest link:
We see that the whole chain’s strength also follows the Weibull distribution, but with somewhat altered parameter:
Similarly, we may consider a zipper, where strength of a zipper is the strength of the hardest-to-open element. Hence, strength of a zipper and of each element of it is described by inverse Weibull distribution:
Later on we will introduce the concept of max-stable/min-stable distributions, i.e. such distributions that maximum/minimum of them follows the same distribution. With this example we can already see that Weibull distribtuion is min-stable and inverse Weibull distribution is max-stable.
In these examples we used the same argument: .
It might seem that we can substitute any distribution as (1 - F(x)) and make the distribution of maximum to take almost any shape.
However, it turns out that this intuition is wrong, as there are just 3 possible shapes of Extreme Value Distribution.
Let’s see, how (counter-intuitively) the maximum of i.i.d. Gaussian random variables converges to Gumbel.
Example 2.4. Convergence of maximum of i.i.d. Gaussian random variables to Gumbel distribution
Step 1. Integration in parts
Let us consider the cdf of normal distribution .
Do integration in parts:
Recall that we are interested in maximum of , not just .
Hence, we want to study the behaviour of and to show that .
Step 2. Split result in two multipliers, select
Let us use our freedom to choose and .
First let us choose , while letting (we will choose a specific at a later stage).
Let :
By evaluating the square of we’ve managed to come up with 2 multipliers. First of them contains , which is required for convergence to Gumbel distribution. Note that as , first term converges to :
The second term will produce that we need to obtain .
Step 3. Select
Choose , so that:
Hence, we end up having , where and .
Then .
This is just one example of a distribution, for which it doesn’t appear at the first glance that its maximum would converge to one of 3 types of EVD, but its maximum actually does converge to Type I. We shall prove as a theorem that for i.i.d. random variables from any distribution there are just 3 options for their maximum to converge to. Let’s first discuss the general approach of how this could be shown, and then will proceed with a formal proof.
General approach: max-stable distributions as invariants/fixed points/attractors and EVD types as equivalence classes
I assume that all three types of Extreme Value Distribution were first discovered experimentally. Later statisticians came up with a proof that EVD can converge to just one of three possible types of distributions and no other types of EVD can exist. Finally, they came up with criteria for a distribution to belong to each type.
Design of this proof is similar to many other proofs. I will outline it informally here:
Assume that as the number of random variables increases, approaching infinity, the distribution of the observed maximum approaches some type of distribution. Then such a distribution type can be considered as an invariant or attractor or fixed point, similar to many other mathematical problems. For instance, eigenvectors are fixed points of matrix multiplication. E.g. matrix eigenvector, multiplied by a matrix, results in itself, multiplied by a scalar. Or no matter how many times you take a derivative of , you get , multiplied by a scalar .
Similarly, maximum-stable distributions are invariant objects. Those are distributions, maximum of i.i.d. variables of which converges to themselves, no matter how many more i.i.d. random variables you toss. E.g. if for one Gumbel-distributed random variable we know that , for Gumbel-distributed random variables the maximum of still is Gumbel-distributed (after centering and normalizing them by some numbers , ): .
Ok. Then after we established that there are some distributions, for which maximum of centered and normalized i.i.d. variables produces a random variable with the same distribution, how do we show that all distributions converge to one of them?
We’ll use another classical mathematical tool: equivalence classes and equivalence relation. For instance, odd numbers and even numbers form two equivalence classes under operation of modulo 2. Odd numbers are equivalent to each other in terms of producing remainder 1 (e.g. , where is equivalence relation of modulo 2), and even numbers are equivalent in terms of producing remainder 0.
Similarly, we will show that types of EVD form equivalence classes under the operation of finding maximum of i.i.d. random variables with any distribution, and as a result all the distributions converge to one of those types. E.g. Pareto’s distribution is equivalent to Cauchy distribution under equivalence relation of convergence of maximum of Pareto/Cauchy i.i.d’s to the same maximum stable type II (Frechet) EVD.
Now that I’ve laid out the plan of the proof, it is time to get into technicalities. I will formally introduce the concepts I mentioned above and prove some lemmas about their relatedness.
Definition 2.1: Max-stable cumulative distribution function
is max-stable if for all and for all x there exists such that for all .
Definition 2.2: Domain of attraction
If is a cdf, then is in the domain of attraction (for maxima) of , and it is written , when there exist sequences such that .
Definition 2.3: Type of convergence
If is another non-degenerate cdf, we say that and have the same type if for all there exist and such that for every x ∈ R .
Khinchin’s theorem (Law of Convergence of Types)
Lemma 2.1: Khinchin’s theorem (law of Convergence of Types)
Suppose that we have a sequence of distribution functions (e.g. the distributions of maximum of random variable in experiments).
Let those distribution functions upon converge to a certain distribution : . Then we have two series of constants .
Suppose there is another distribution function such that the sequence of distributions converges to that function: and there is a different pair of series .
Then and , .
Proof:
Consider two distribution functions and , such that for every : and .
Denote . Then and .
Similarly and and .
Now choose two points: , corresponding to , and , corresponding to and subtract and from each other:
Apply the same for :
Which results in .
Substitute into and .
On the other hand we recall that . Subtracting these, we get: or .
Hence, .
Lemma 2.2: Necessary condition of maximum-stability
Given G a non-degenerate cdf:
- G is max-stable if and only if there exists a sequence of cdf ’s and sequences
, such that for all
- if and only if is max-stable. In that case, .
Proof:
Proposition 1 direct statement: if is max-stable, there exists such that …
If is max-stable, then by definition for every there exist , , such that .
Define . Then . We arrive at the direct statement.
Proposition 1 reverse statement: if is max-stable, there exists such that …
Let us proof the reverse statement: suppose that the sequences , , exist, such that for all :
Then consider and :
and
By Khinchin’s lemma there exists .
Similarly, for every other : or , which is the definition of max-stability.
Proposition 2 direct statement:
The proof is self-evident: if G is max-stable, , and by defintion.
Proposition 2 reverse statement:
Assume , i.e. .
For all we have .
Hence,
This makes and fit for the conditions of previous result, proving that is max-stable.
Corollary 2.1:
Let be a max-stable cdf. Then there exist functions and such that for all , for all , .
Corollary is self-evident from inversion of indices .
Fisher-Tippett-Gnedenko theorem (Extreme Value Theorem)
Sir Ronald Aylmer Fisher | Leonard Henry Caleb Tippett | Boris Vladimirovich Gnedenko |
---|---|---|
![]() |
![]() |
![]() |
Theorem 2.1: Fisher-Tippett-Gnedenko theorem (Extreme Value Theorem)
Let be a sequence of i.i.d. random variables.
If there exist constants , and some non-degenerate cumulative distribution function such that , then is one of these:
(Type I) Gumbel: , ,
(Type II) Frechet: , ,
(Type III) Inverse Weibull: , .
Proof
Here we give the proof of Fisher-Tippett-Gnedenko theorem without introducing any additional pre-requisites and intermediate constructs. Because of that it might look like black magic now. It is not clear, how anyone could’ve come up with this proof.
However, later on in parts 3 and 4 we will give the definitions of tail quantile function and tools from Karamata’s theory of slow/regular variation.
If you revisit this proof afterwards, you will notice that we’re making use of those tools, without naming them explicitly.
Step 1.
Consider double negative logarithm of max-stable distribution .
Step 2.
Denote . Then from previous .
Step 3.
Denote . Apply to both sides. We get: .
Step 4.
Note that . Subtract from both sides:
Step 5.
Substitute variables: , , . Then:
Step 6.
We can swap and in previous equation, settings and :
After that subtract from :
Here we consider two cases.
Step 7a.
If , previous equation leads us to . But then let’s substitute into the result of step 5:
This means that and denoting , we get:
, which is Gumbel (Type I) EVD.
Step 7b.
If :
Now recall that and substitute there:
This leads us to equation , which, upon monotonous has a solution . Hence:
, where .
Now recall that , and we get: . Hence:
, which is either a Frechet (Type II), or a Inverse Weibull (Type III) EVD.
Distributions not in domains of attraction of any maximum-stable distributions
We’ve shown that if maximum of n i.i.d. random variables of current distribution converge to any maximum-stable distribution, it is one of the 3 described types. However, maximum might not converge to any max-stable distribution at all.
For instance, Poisson distribution and Geometric distribution do not converge to any type of Extreme Value Distriubtion. To show this we will need much more tools in our toolbox, the corresponding theorem will be proven in the end of section 4.
3. Von Mises sufficient conditions for a distribution to belong to a type I, II or III
The Fisher-Tippett-Gnedenko theorem is an important theoretical result, but it does not provide an answer to the basic question: what type of EVD does our distribution function belong to?
Fortunately, there are two sets of criteria that let us determine the domain of attraction of . First, there are von Mises conditions, which are sufficient, but not necessary. Still, they are more intuitive and give a good insight into what kinds of distributions converge to what types of EVD and why. Second, there are general sufficient and necessary conditions. Proving them is a much more technical task and requires some extra preliminaries.
We will start with von Mises conditions, postulated by Richard von Mises in 1936, 7 years before Fisher-Tippett-Gnedenko theorem was proved by Boris Gnedenko in 1943. Von Mises conditions are formulated in terms of survival analysis. We shall introduce some basic notions from survival analysis first.
Pre-requisites from survival analysis
Definition 3.1: Survival function
Survival function is reverse of cumulative distribution function : .
Basically, if our random variable’s value represents a human longevity, cumulative distribution funcion represents the fraction of people, who die by the time .
Survival function on the contrary is the fraction of people, who are still alive by the time .