Solving Kubernetes Configuration Woes with a Custom Controller
Joel is a Cloud Infrastructure Engineer at Pusher, working on building their internal Kubernetes platform. He has been working in DevOps for over four years. Recently he has been focusing on projects to extend Kubernetes, building admission and custom resource controllers to improve developer experience on the platform. Joel is passionate about auto-scaling, deployment pipelines, authentication and authorization. He also built and maintains Marvin, a ChatOps bot, for the Pusher engineering team.
Two years ago, Pusher started building an internal Kubernetes based platform. As we transitioned from a single product to multiproduct company, we wanted to help our product teams spend less time worrying about shared concerns such as infrastructure and be able to focus more on writing business logic for our products.
Over this period, our platform team have solved many of the problems that Kubernetes doesn’t solve out of the box. Until recently, we had not solved the problem of configuration. Unfortunately, in a vanilla Kubernetes cluster, unlike most resources, changes to configuration (stored in ConfigMaps and Secrets) does not trigger any change to the state of applications running inside the cluster.
Many applications running on our platform, both first and third party, cannot dynamically reload their configuration. When their Configuration is updated, although files mounted into the containers may be updated (if they are from a ConfigMap), the applications do not watch for changes and will only load the new configuration if the process is restarted, or more typically, new Pods are created and the existing Pods terminated.
Since Kubernetes normally reconciles your desired state for you, a typical user might assume that updating a ConfigMap or Secret will make applications load the new configuration. Since this is not the case, it is very easy for a user to update a ConfigMap or Secret and not replace the Pods that mount the config, thus leaving the running configuration differing from the updated desired configuration.
What Were Our Configuration Woes?
Normally in Kubernetes, when you update your desired state, there is some control loop which looks at the state of the world, and adjusts the state to bring it in line with your desired change. When you update the template for a Pod within a deployment, the deployment controller will replace all existing Pods with new ones, matching the updated specification.
Configuration in Kubernetes is stored in ConfigMaps and Secrets. These are mounted into Pods via file mounts or environment variables to allow the container processes to read the data. There is, however, a problem with this approach. Neither ConfigMaps nor Secrets are versioned, nor do they have control loops. While updating a ConfigMap will update the mounted file within a Pod, updating Secrets triggers no change within the cluster.
Products on the Pusher platform rely on a key component called the Bridge. The Bridge is part of our ingress layer, routing traffic to backend services and handling lots of Pusher protocol logic in the process. The Bridge, unfortunately, cannot dynamically reload its configuration.
The component is essential to the operation of all of our products that run on the Platform (Chatkit, Beams, TextSync) and when it goes down, everything else goes with it.
We haven’t had too many incidents on the platform in the last two years (admittedly we have only been GA for a few months), but when we have had outages, the post-mortem almost always leads to the same conclusion. The config running inside the Bridge, was not the config inside the ConfigMap that it mounts.
Most of the incidents have been triggered by one of the infrastructure team rolling out changes to the Kubernetes cluster. We run all of our machines immutably and as such, when we want to make changes, we replace every node in the cluster, and in turn, every Pod. New Pods will start with whichever config exists at that point.
We have found that, despite the procedures we have in place, the Pods for our components weren’t always being replaced when the configuration was updated and, in some cases, Pods only got replaced about 2 weeks after a broken config was applied. If you have ever been in the same situation, you’ll understand how much of a nightmare it is to have no idea what has changed and, given there’s no versioning, are unable to rollback.
How We Solved the Problem
We are currently in the process of automating our deployment pipeline using our GitOps project Faros. Our Platform Services team have a number of services that can’t dynamically reload configuration and raised concerns about the project. To be able to synchronize configuration updates on git merge, they needed a guarantee that configuration updates would actually be deployed and, in the case that the config is broken, someone would be notified to git revert the changes they just deployed.
This is where our new project Wave, a custom controller built using the Kubebuilder, comes in. Wave’s role is to ensure that when a ConfigMap or Secrets is updated, any Deployment mounting said ConfigMap or Secret replaces all of its Pods. Wave effectively nudges the Kubernetes built-in Deployment controller and gets it to perform a rolling update, removing Pods running the old configuration and creating new Pods with the updated configuration.
How Does Wave Work?
Wave, like other Kubernetes controllers, subscribes to events from the Kubernetes API for Deployment objects. This means that, whenever an operation (Create/Read/Update/Delete) is performed on any Deployment, Wave can process the Deployment and see if it needs to make any changes.
The first thing that Wave does is check for the presence of an Annotation on the deployment, (wave.pusher.com/update-on-config-change: true). If the Annotation is not present, Wave ignores the deployment. This makes Wave an opt-in controller and to benefit from Wave, users have to manually assign their Deployment to be managed by Wave. It is, therefore, safe to deploy to existing Kubernetes clusters without worry that it will suddenly start interfering with deployed workloads that don’t need its update triggering capability.
Secondly, Wave parses the Deployment and looks for references to ConfigMaps and Secrets that are mounted into the Pods created by the Deployment. It then fetches each of the ConfigMaps and Secrets mounted and using a reproducible algorithm, creates a hash of the current configuration. The hash represents the configuration from all ConfigMaps and Secrets mounted and will only change if new mounts are added, mounts are taken away, or any of the fields in data within a mount are modified.
This hash is then placed, as an Annotation, on the Pod Template within the Deployment. This is where we leverage the built-in Kubernetes Deployment controller. By modifying the metadata of the Pod Template, the Deployment controller counts this as an update and starts processing the Deployments update strategy.
In summary, whenever config is updated, the hash that Wave calculates will change, Wave will update the hash on the Deployment’s Pod Template, and then the Deployment controller will read the update strategy and start a rollout of the new configuration by creating a new ReplicaSet and in turn, new Pods.
Since Wave also subscribes to events for ConfigMaps and Secrets, it can use Owner References to track which Deployments are mounting which ConfigMaps and Secrets and, whenever an update occurs to one of these, reconcile the parent Deployment to update its hash. We don’t, however, use the garbage collection part of Owner References. Wave removes any Owner References once a Deployment is marked for deletion, otherwise the ConfigMaps and Secrets that the Deployment owned would be deleted too.
We have Wave deployed to our systems and enabled for a number of Deployments, both within our team and across the wider organization. Since starting to use it we have been catching broken config quicker (no more production incidents) and have been able to reduce a lot of manual deployment procedure (deploy config, slowly restart pods).
The project has fixed one of our longest standing issues. We have more confidence when deploying now and know that updates to the desired configuration will actually be running moments after they are applied.