Notes on Reproducing Kernel Hilbert Space (RKHS)
August 03, 2021 4 min read
Reproducing Kernel Hilbert Space is an object from the field of functional analysis that nowadays finds practical applications in both data science and quantum computing. In data science you might run into it, while studying topics associated with kernel methods, such as kernel trick in SVMs, kernel regressions and kernel PCA, while in quantum mechanics Hilbert spaces and symmetric/Hermitian operators are generally the language of the whole thing. In this post I'll try to summarize my readings about it, linearizing the pre-requisites into a coherent story.
Our today’s subject consists of 4 words and 3 concepts - Reproducing, Kernel and Hilbert spaces. I’ll first explain the meaning of each of them separately, providing the missing context, and then will consider them together. RKHS is a beast with multiple faces, so you can approach it from different perspectives and then find out that they are all interconnected - quite a typical situation for the kaleidoscope of mathematics. I’ll discuss those approaches in the second part of the post.
Integral transform and its kernel
The term “kernel” here in Reproducing Kernel Hilbert Space is used in the context of integral transforms, which has absolutely nothing (1, 2) to do with the kernel of homomorphism or linear transformation.
As an example of integral transformation consider a sound signal - you have a WAV-file, which is a function in the time domain, which equals to the air pressure, produced by your headphones at every given moment of time, while they play the signal. WAV-files are typically huge, so you want to compress the singal by decomposing the sound into harmonics of various frequencies and sacrificing the very high-frequency ones, as 1) their amplitudes are typically very low and 2) human ear can’t hear them anyways. This is more or less how MP3 files work and that’s why they are much smaller than WAVs.
So, you do a Fourier series decomposition of your initial signal , which is an integral transform:
, where is the amplitude of harmonic with frequency .
Speaking generally, what we did here was an integral transform from a function in the time domain to a function in the frequency domain.
This is formally written as:
Here is an integral transform of function , which results in a function in a different domain , and is the kernel of integral transform. In case of Fourier transform our kernel was .
RKHS deals with some special case of kernels, which possess some nice and helpful properties, which we shall discuss later.
Hilbert spaces are arguably the main subject of study of functional analysis.
Why Banach-Wiener spaces wouldn’t cut it?
Infinite-dimensional vectors as Hilbert spaces: linear algebra perspective
Existence of countable basis: Fourier series perspective
ML perspective: Hilbert space as a feature space
Kernels as inner product
Example: gaussian mixtures and RBF
- https://www.youtube.com/watch?v=XUj5JbQihlU - incredibly fun and comprehensible talk by enthusiastic Yaser Abu-Mostafa from Caltech
- https://www.gatsby.ucl.ac.uk/~gretton/coursefiles/Slides4A.pdf - presentation by brilliant Arthur Gretton from UCL
- https://en.wikipedia.org/wiki/Reproducing_kernel_Hilbert_space - a decent and helpful explanation from wikipedia
- https://stats.stackexchange.com/questions/123413/using-a-gaussian-kernel-in-svm-how-exactly-is-this-then-written-as-a-dot-produc - a nice question, explaining why RBF kernel is actually a RKHS kernel
- https://en.wikipedia.org/wiki/Hilbert_space - one of the best articles on mathematics, I’ve ever seen on wikipedia
Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram