BorisBurkov.net
cover

Notes on Reproducing Kernel Hilbert Space (RKHS)

August 03, 2021 4 min read

Reproducing Kernel Hilbert Space is an object from the field of functional analysis that nowadays finds practical applications in both data science and quantum computing. In data science you might run into it, while studying topics associated with kernel methods, such as kernel trick in SVMs, kernel regressions and kernel PCA, while in quantum mechanics Hilbert spaces and symmetric/Hermitian operators are generally the language of the whole thing. In this post I'll try to summarize my readings about it, linearizing the pre-requisites into a coherent story.

Our today’s subject consists of 4 words and 3 concepts - Reproducing, Kernel and Hilbert spaces. I’ll first explain the meaning of each of them separately, providing the missing context, and then will consider them together. RKHS is a beast with multiple faces, so you can approach it from different perspectives and then find out that they are all interconnected - quite a typical situation for the kaleidoscope of mathematics. I’ll discuss those approaches in the second part of the post.

Integral transform and its kernel

The term “kernel” here in Reproducing Kernel Hilbert Space is used in the context of integral transforms, which has absolutely nothing (1, 2) to do with the kernel of homomorphism or linear transformation.

As an example of integral transformation consider a sound signal - you have a WAV-file, which is a function f(t)f(t) in the time domain, which equals to the air pressure, produced by your headphones at every given moment of time, while they play the signal. WAV-files are typically huge, so you want to compress the singal by decomposing the sound into harmonics of various frequencies and sacrificing the very high-frequency ones, as 1) their amplitudes are typically very low and 2) human ear can’t hear them anyways. This is more or less how MP3 files work and that’s why they are much smaller than WAVs.

spectrogram

A spectrogram of sound from Audacity software: air pressure is at the top, amplitudes of harmonics are at the bottom.

So, you do a Fourier series decomposition of your initial signal f(t)f(t), which is an integral transform:

f^(ω)=0te2πiωtf(t)dt\hat{f}(\omega) = \int \limits_{0}^{t} e^{-2 \pi i \omega t} f(t) dt , where f^(ω)\hat{f}(\omega) is the amplitude of harmonic with frequency ω\omega.

Speaking generally, what we did here was an integral transform from a function f(t)f(t) in the time domain to a function f^(ω)\hat{f}(\omega) in the frequency domain.

This is formally written as:

(Tf)(u)=t1t2K(t,u)f(t)dt(Tf)(u) = \int \limits_{t_1}^{t_2} K(t, u) f(t) dt

Here TfTf is an integral transform of function f(t)f(t), which results in a function in a different domain uu, and K(t,u)K(t, u) is the kernel of integral transform. In case of Fourier transform our kernel was K(t,ω)=e2πiωtK(t, \omega) = e^{-2 \pi i \omega t}.

RKHS deals with some special case of kernels, which possess some nice and helpful properties, which we shall discuss later.

Hilbert spaces

Hilbert spaces are arguably the main subject of study of functional analysis.

Why Banach-Wiener spaces wouldn’t cut it?

TODO

Infinite-dimensional vectors as Hilbert spaces: linear algebra perspective

TODO

Existence of countable basis: Fourier series perspective

TODO

Reproducing property

TODO

ML perspective: Hilbert space as a feature space

TODO

Kernels as inner product

TODO

Example: gaussian mixtures and RBF

TODO

Mercer theorem

TODO

References


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram