Advanced Infrastructure Management in Kubernetes using Python

Automate managing complex applications in a cloud native way using Operators written in Python

Gautam Prajapati

Deployment/Continuous Integration and Delivery DevOps general Distributed Systems Infrastructure python

See in schedule

*** Talk Details and Timeline

** Why? (5 mins)
Start with a simple example to connect the audience with the topic and a real-world scenario.

I. A typical python application running in production can be configured using a ConfigMap and Secret object. Let’s say you change a value in that configuration. You’d mostly go patch or run the deployment for the server to reload the configuration.

Now imagine we had a watcher configured that watches for changes in the secrets and automatically triggers the patching of our server. Wouldn’t that make day-to-day operations so much easier?

II. A bit more complex scenario to connect with real-world applications
For python applications, celery has become a defacto standard for doing processing tasks in a distributed way. While running celery in production, many things might require human intervention -
- It’s not easy to get a new setup right, many possibilities to misconfigure message broker, logging configurations, concurrency control and so on
- You might want to set up flower for every celery deployment to have adequate monitoring in place
- You might want to have an HPA configured on queue length to increase or decrease the number of workers

III. Many stateful applications, like databases, message queuing systems, caching systems, need a specific way of handling how they start, how they scale, and how they are shut down. All of this application-specific logic can be automated by writing an operator, right in Python.

** Solution

** Intro (3 min)
Many of us use Kubernetes in production. The Operator pattern is a buzzword nowadays that allows you to write code to automate the repeatable tasks that go beyond Kubernetes' native capabilities. Currently, Golang is the de-facto standard where people choose to write an Operator. This talk also aims to encourage writing operators and supporting frameworks, in the Python ecosystem.

The general idea is to keep engineers focused on writing code and the operator will inject necessary pieces of infrastructure like init containers, configmaps, env variables, logging setup, etc. as well as manage it when it comes to deployment.

The Celery example shared is also an ongoing discussion on CEPS (Celery Enhancement Proposals Issue#24). We’ll see a basic celery operator in action written with Python later in the talk.

** Kubernetes

* Brief About Kubernetes architecture (5 mins)
- Scalability and separation of workloads
- Current stateless and stateful applications support
- Extending capabilities, native libraries in almost all popular languages including Python
- Brief architecture overview, explain the components(API Server, Nodes, Kubelet etc.)
- Self-healing capabilities (with a desired to current state diagram)

* Brief about Controllers (3mins)
- The observer pattern
- Reconciliation loop

* Brief about CRDs(Custom Resource Definitions) (4 mins)
- Show what a regular resource looks like (Pod/Service YAMLs)
- A CRD for Celery application as example

* What is an operator? (2min)
- Kubernetes extension that does a really good job of maintaining and monitoring your software for you
- A custom controller combined with CRDs which encapsulates your business logic
- Allows you to perform backups, scaling, updating configuration, etc. Any tasks you might want to do during operation of an application can be automated using an Operator.

* Finally, some code(12-15mins)
- Demonstration of a basic celery operator. Write a single task with some basic configuration details, operator to spin up a production ready(subjective) celery deployment for that task.
- Final code of operator will be open-source and live in this WIP repo -
- A structured tour of the operator code
- Scope of project to include:
I. A CRD for Celery applications containing basic configuration parameters like number of workers, broker, memory, cpu constraints etc.
II. A custom controller implementation that keeps a watch on K8s API Server for changes and reacts to them
III. Controller can spawn a new Kubernetes deployment object for celery and flower for monitoring

*** Conclusion(5min)

** What are some other use-cases?
- Database clusters (MySQL, Postgres Operators)
- Publishing a Service for applications that don’t support Kubernetes APIs to discover them - For example, exposing Flower UI/API in our celery operator as a Kubernetes service
- The registry of Kubernetes Operators -

** When should you use operators vs when you shouldn’t?
** What are people doing with operators?

*** Q&A (5 min)
** Expected Questions from the Audience

- Difference between an Operator and Custom Controller
- More open-source tools to try out writing an operator
- Helm vs Ansible vs Operator

Type: Talk (45 mins); Python level: Beginner; Domain level: Intermediate

Gautam Prajapati


I'm a Software Engineer working with Grofers - largest online grocery shopping platform in India. I'm an open-source evangelist with decent experience building production-ready systems with Python and Kubernetes at a scale of million daily active users. I'm a Google Summer of Code'17 scholar and have contributed to codebases of LibreOffice, Mozilla and others in the past.

When I'm not coding, I give back to the community by organizing regular support-group circles, volunteer drives and providing emotional counselling to cancer patients and caregivers.