Notes on Reproducing Kernel Hilbert Space (RKHS)

August 03, 2021 4 min read

Reproducing Kernel Hilbert Space is an object from the field of functional analysis that nowadays finds practical applications in both data science and quantum computing. In data science you might run into it, while studying topics associated with kernel methods, such as kernel trick in SVMs, kernel regressions and kernel PCA, while in quantum mechanics Hilbert spaces and symmetric/Hermitian operators are generally the language of the whole thing. In this post I'll try to summarize my readings about it, linearizing the pre-requisites into a coherent story.

Our today’s subject consists of 4 words and 3 concepts - Reproducing, Kernel and Hilbert spaces. I’ll first explain the meaning of each of them separately, providing the missing context, and then will consider them together. RKHS is a beast with multiple faces, so you can approach it from different perspectives and then find out that they are all interconnected - quite a typical situation for the kaleidoscope of mathematics. I’ll discuss those approaches in the second part of the post.

Integral transform and its kernel

The term “kernel” here in Reproducing Kernel Hilbert Space is used in the context of integral transforms, which has absolutely nothing (1, 2) to do with the kernel of homomorphism or linear transformation.

As an example of integral transformation consider a sound signal - you have a WAV-file, which is a function f(t)f(t) in the time domain, which equals to the air pressure, produced by your headphones at every given moment of time, while they play the signal. WAV-files are typically huge, so you want to compress the singal by decomposing the sound into harmonics of various frequencies and sacrificing the very high-frequency ones, as 1) their amplitudes are typically very low and 2) human ear can’t hear them anyways. This is more or less how MP3 files work and that’s why they are much smaller than WAVs.


A spectrogram of sound from Audacity software: air pressure is at the top, amplitudes of harmonics are at the bottom.

So, you do a Fourier series decomposition of your initial signal f(t)f(t), which is an integral transform:

f^(ω)=0te2πiωtf(t)dt\hat{f}(\omega) = \int \limits_{0}^{t} e^{-2 \pi i \omega t} f(t) dt , where f^(ω)\hat{f}(\omega) is the amplitude of harmonic with frequency ω\omega.

Speaking generally, what we did here was an integral transform from a function f(t)f(t) in the time domain to a function f^(ω)\hat{f}(\omega) in the frequency domain.

This is formally written as:

(Tf)(u)=t1t2K(t,u)f(t)dt(Tf)(u) = \int \limits_{t_1}^{t_2} K(t, u) f(t) dt

Here TfTf is an integral transform of function f(t)f(t), which results in a function in a different domain uu, and K(t,u)K(t, u) is the kernel of integral transform. In case of Fourier transform our kernel was K(t,ω)=e2πiωtK(t, \omega) = e^{-2 \pi i \omega t}.

RKHS deals with some special case of kernels, which possess some nice and helpful properties, which we shall discuss later.

Hilbert spaces

Hilbert spaces are arguably the main subject of study of functional analysis.

Why Banach-Wiener spaces wouldn’t cut it?


Infinite-dimensional vectors as Hilbert spaces: linear algebra perspective


Existence of countable basis: Fourier series perspective


Reproducing property


ML perspective: Hilbert space as a feature space


Kernels as inner product


Example: gaussian mixtures and RBF


Mercer theorem



Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram