BorisBurkov.net
cover

Survival analysis - survival function, hazard rate, cumulative hazard rate, hazard ratio, Cox model

June 11, 2021 8 min read

Here I discuss the statistics apparatus, used in survival analysis and durability modelling.

Hazard rate and survival function

“Death of a person is a tragedy, deaths of millions is statistics” - Joseph Stalin

Hazard rate is just a renormalization of the probability space that takes pallid impersonal statistics on input and converts it into your own chances to live another day.

Suppose you’re an average young man in the Wild West. You decide to pursue a questionable career of a train robber.

Assume that the chance of an average guy surviving his first train robbery is 12\frac{1}{2}. After that you get slightly more experienced and for your second train robbery your chance of survival is 23\frac{2}{3}. Now, you’re even more experienced and for the third stint the chance of survival is 34\frac{3}{4}.

So the night before your third robbery you might ask yourself, whether it is worth the risk of dying with 25% chance tomorrow, or should you rather give up on train robberies altogether and move on to start a career in finance?

The data you want to ask yourself this question is the chance of survival tomorrow, which is the Hazard rate.

Unfortunately, it’s impossible to get the data about your odds in the real life. What you could do instead is take a look at the cumulative distribution function F(t)F(t) of a train robber’s life expectancy, or, rather its counterpart S(t)=1F(t)S(t) = 1-F(t), called the survival function:

Survival function

Probability mass function (which is a discrete-case analogue of the continuous probability density function) of dying at your third robbery p(ξ=3)=18p(\xi = 3) = \frac{1}{8} . We can more or less reformulate this as a continuous problem p(3ξ<4)=Fξ(3)Fξ(4)=fξ(3)dxp(3 \leq \xi < 4) = F_\xi(3) - F_\xi(4) = f_\xi(3)dx, where ξ\xi is a random variable indicating the number of robberies an average train robber survives, dx=1dx=1, Fξ(x)F_\xi(x) is cumulative distribution function and fξ(x)f_\xi(x) is probability density function.

So you see, probability density function/probability mass function answers a wrong question. It says that out of all repeat train robbers the fraction that dies at their third robbery is (18\frac{1}{8}). But the question you want to ask is: if I go for my third robbery tomorrow, what are my chances to survive it, and the answer you want is (34)(\frac{3}{4}).

Now, let’s start formalizing this. For a discrete-time variable, Hazard function is your chance to die during your next robbery number tt:

S(t)S(t+1)fraction of train robbers who die at t=λ(t)hazard function at tS(t)fraction of survivors by tδt1 in discrete-time case\underbrace{S(t) - S(t+1)}_\text{fraction of train robbers who die at t} = \underbrace{\lambda(t)}_\text{hazard function at t} \cdot \underbrace{S(t)}_\text{fraction of survivors by t} \cdot \underbrace{\delta t}_\text{1 in discrete-time case}

Thus, hazard function is defined as:

λ(t)=δS(t)δtS(t)\lambda(t) = \frac{-\delta S(t)}{\delta t \cdot S(t)}

Or, in continuous-time case:

λ(t)=S(t)tS(t)=f(t)S(t)\lambda(t) = \frac{-\partial S(t)}{\partial t \cdot S(t)} = \frac{f(t)}{S(t)}

In a continuous-time case instead of simply potentiating the hazard rate, you need to wrap your head around integrating the hazard function. Let us calculate the risk a person would accumulate over a period of time t by induction:

S(t0)S(t0+dt)=λ(t0)dtS(t0)S(t_0) - S(t_0+dt) = \lambda(t_0) dt \cdot S(t_0)

S(t0+dt)S(t0+2dt)=λ(t0+dt)dtS(t0+dt)S(t_0+dt) - S(t_0+2dt) = \lambda(t_0+dt) dt \cdot S(t_0+dt), thus summing those up:

S(t0)S(t0+2dt)=λ(t0)dtS(t0)+λ(t0+dt)dtS(t0+dt)=λ(t0)dtS(t0)+λ(t0+dt)dtS(t0)(1λ(t0)dt)1, neglect dt-squared=λ(t0)dtS(t0)+λ(t0+dt)dtS(t0)S(t_0) - S(t_0+2dt) = \lambda(t_0) dt \cdot S(t_0) + \lambda(t_0+dt) dt \cdot S(t_0+dt) = \lambda(t_0) dt \cdot S(t_0) + \lambda(t_0+dt) dt \cdot S(t_0) \cdot \underbrace{(1-\lambda(t_0)dt)}_\text{1, neglect dt-squared} = \lambda(t_0) dt \cdot S(t_0) + \lambda(t_0+dt) dt \cdot S(t_0)

S(t0)S(t0+t)=(λ(t0)+λ(t0+dt)+λ(t0+2dt)+...+λ(t0+tdtdt))t / dt timesdtS(t0)=t0t0+tλ(u)duS(t0)S(t_0) - S(t_0+t) = \underbrace{ (\lambda(t_0) + \lambda(t_0 + dt) + \lambda(t_0 + 2dt) + ... + \lambda(t_0 + \frac{t}{dt} dt))}_\text{t / dt times} \cdot dt \cdot S(t_0) = \int \limits_{t_0}^{t_0+t} \lambda(u)du \cdot S(t_0)

Again, from Bayesian point of view hazard rate (multiplied by dtdt) can be viewed as λ(t)dt=P(t<Tt+dtT>t)=p(t<Tt+dtT>t)p(T>t)=p(t<Tt+dt)p(T>t)=f(t)dtS(t)\lambda(t)dt = P(t < T \leq t+dt | T > t) = \frac{ p(t < T \leq t+dt \cap T > t) }{p(T > t)} = \frac{ p(t < T \leq t+dt) }{p(T > t)} = \frac{f(t)dt}{S(t)}.

Cumulative hazard rate

The integral of hazard rate t0t0+tλ(u)du\int \limits_{t_0}^{t_0+t} \lambda(u) du that we used in the previous section, is useful in itself.

If it is taken between points of time 0 and t: Λ(t)=0tλ(u)du\Lambda(t) = \int \limits_{0}^{t} \lambda(u) du, it is called cumulative hazard rate Λ(t)\Lambda(t).

Cumulative hazard rate Λ(t)\Lambda(t) is a funny thing. It essentially enumerates and sums up all the chances of death you escaped by the current moment. So, for instance, at your first train robbery you had a chance to die of 1/21/2, at the second - 1/31/3, at the third - 1/41/4.

So by the time you start contemplating your fourth robbery, the “number of deaths” you deserved by now Λ(t)=1/2+1/3+1/4=1.083333\Lambda(t) = 1/2 + 1/3 + 1/4 = 1.083333, so in a fair world you would have already been more than dead, exercising your luck so readily…

A corollary from the definition of cumulative hazard rate is its connection to survival function:

Λ(t)=0tS(u)S(u)du=0t1S(u)dS(u)=lnS(t)\Lambda(t) = \int \limits_{0}^{t} -\frac{S'(u)}{S(u)} du = \int \limits_{0}^{t} -\frac{1}{S(u)} dS(u) = -\ln S(t) , hence, S(t)=eΛ(t)S(t) = e^{-\Lambda(t)}.

Cox proportional hazards model and hazard ratio

Sir David Cox has come up with a linear regression-ish model for factors, influencing the hazard ration:

λ(tXi)=λ(t)eβ1x1+β2x2+...+βnxn\lambda(t|X_i) = \lambda(t) e^{\beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n}

For Cox proportional hazards models you’d often consider log hazard rate instead of the hazard rate itself.

Hazard ratio reflects the difference in hazards rates for models with different values of factors. For instance for patients 1 and 2 with different values of some factor:

λ1(t)=λ0(t)ei=1nβixi\lambda_1(t) = \lambda_0(t) e^{\sum \limits_{i=1}^{n}\beta_i x_i}

λ2(t)=λ0(t)ei=1nβixi\lambda_2(t) = \lambda_0(t) e^{\sum \limits_{i=1}^{n}\beta_i x_i'}

Then the hazard ratio equals:

λ1(t)λ2(t)=λ0(t)ei=1nβixiλ0(t)ei=1nβixi=ei=1nβixiei=1nβixi\frac{\lambda_1(t)}{\lambda_2(t)} = \frac{\lambda_0(t) e^{\sum \limits_{i=1}^{n}\beta_i x_i}}{\lambda_0(t) e^{\sum \limits_{i=1}^{n}\beta_i x_i'}} = \frac{e^{\sum \limits_{i=1}^{n}\beta_i x_i}}{e^{\sum \limits_{i=1}^{n}\beta_i x_i'}}

I can’t say much on this subject as I haven’t used these models yet.


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me on Telegram