BorisBurkov.net

OpenStack, Kubernetes and OpenShift crash course for impatient - introduction

January 18, 2018 2 min read

Much like a junkie from a russian anecdote, who started shouting "Jiggers, cops!" when they brought him to the police station, EBI in 2018 suddenly discovered the existence of cloud technologies.

What solutions are available and which one to use in your particular case?

Currently, 2 cloud solutions are being adopted by EBI at the same time:

I'm using terms IaaS, PaaS, SaaS as summarised here:

OpenStack is a lower-level IaaS solution, it allows you to provision VMs on demand from images that you make, mount data volumes and monitor the state of your VMs via a web dashboard. It's unaware of Docker or any software you run on top of it - you have to manually program the logic of servers provisioning, software installation, failover etc.

OpenShift is a higher-level PaaS solution that pretty much offers Kuberenetes on top of OpenStack (OpenShift = Kubernetes + OpenStack).

Google Kubernetes is the winner of orchestration solutions race according to what I read and hear at meetups and talks from tech consultancies (notable rivals being Docker Swarm, Apache Mesos and Hashicorp Nomad).

Kubernetes allows you to declaratively configure deployment of your application as a set of Docker containers on top of whatever infrastructure you have, automatically monitors your application state, restarts crashed machines, manages service discovery, allows for horizontal scaling and redundancy of your microservices, allows for rolling green-blue deployments of new versions, re-deploys on webhooks from github etc.

OpenShift just provides first-class integration of Kubernetes with OpenStack off the shelf.

So, I believe, EBI could use both for different purposes.

OpenShift is perfect for website deployments. For instance, in RNAcentral.org deployment we shall have multiple microservices: at least the web server itself and background worker processes for sequence search and text search results export.

Kubernetes will handle their orchestration, redundancy/horizontal scaling, disaster recovery, monitoring and partially automate CI/CD. (Of course, we can try running Kubernetes on top of OpenStack, but OpenShift seems to be doing exactly that in a nicer fashion)

OpenStack is more suited for running bioinformatical pipelines of release jobs (like we do for Rfam and RNAcentral). We might not need Kubernetes: we can deploy some existing pipeline solution like Toil or Arvados on top of OpenStack as computational resources backend (or just replace LSF cluster commands in our existing scripts with calls to OpenStack APIs).

This is a brief crash course on OpenStack, Kubernetes and OpenShift for the purposes of bioinformatics.

Part 1. OpenStack

Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in building future technologies, think about the world, we're living in at present and admires the giants of the past. You can follow me on Telegram