Kubernetes Canary Deployments for User Beta testing

Damien Marshall
ITNEXT
Published in
5 min readOct 17, 2019

--

At NewsWhip we like to run tests of new features that we have in active development with real users in order to ensure that what we’re building is valuable. Typically these are rolled out internally first, so we can gain valuable internal feedback, followed by gradual releases to external users as the feature gets closer to production readiness.

While this process is invaluable for us to build features of the highest user value, the technical implications for running such a program were quite heavy weight. With our technology stack, we essentially had two options available, both of which had issues for us:

  • Use feature flags to turn on features for specific users. This is problematic for us as we don’t have a robust feature flag system in place, so choosing this path had a large impact on code maintenance and testing, particularly for major changes.
  • Use a completely separate deployment behind another DNS, such as betatest.newswhip.com. This is also problematic for us as it required users needing to switch to a new site to make use of beta features, thus affecting engagement in the feature, and a reduction in available test users.

Enter Kubernetes Canary Deployments

With a recent migration of our infrastructure to Kubernetes, we started exploring if the deployment mechanisms available within that ecosystem could present a more robust solution to us, particularly “canary deployments”.

Canary deployments are well documented elsewhere, but are summarised here. A canary deployment is one where a subset of servers responsible for running a service is updated with a new version of the service code, the “canaries”, allowing testing of new code side by side with existing code. The number of servers running the new version is gradually increased and the older version is decreased, and eventually is completely deployed once everything is determined to be working correctly. In the example below we can see how the number of nodes running version 2, the canary nodes, of a service is gradually increased and operates alongside older versions of the service code

Overview of how canary deployments work with a Kubernetes service and deployment

Canary deployments are invaluable as they allow for safe testing of new functionality in production, and for also ensuring there is no degradation in performance or an increase in error rates.

Canary deployments are often in conjunction with Kubernetes services and deployments to safely manage rollouts of updates for more traditional “backend” services, such as an internal API service sitting in front of a database.

However, the Nginx ingress, which NewsWhip makes use of to route the requests from external users to our web applications, also has support for canary deployments. Using this functionality, Nginx can route requests with a specific flag set to a canary service, behind which a fully scaled deployment is running, as shown in the image below.

Canary configuration within the Ingress allows routing to different services

We can see here how the Beta user request has a cookie value of inBeta set to “always”, which means that request is directed to the canary service. All other requests go to the production service. Nginx can route requests to a canary service based on a value set in header, in a cookie, or based on percentage of requests.

With this in mind, we set about testing, and then using, the canary nginx configuration to run our beta testing infrastructure.

Using Canary Deployments with Nginx in Kubernetes

Setting up Nginx to handle routing of requests to different services is quite straightforward, but initially can be a little confusing as there are two separate YAML configuration files for the same ingress. A high level overview of our setup is as follows:

  • Two services running — one for production and one for canary. In our example these are yourapp-prod-service and yourapp-canary-service
  • Production nginx running as normal, configured with all of the routes you need. For example, here we have a snippet of nginx configuration routing yourapp.example.com to yourapp-prod-service listening on port 9000
  • We then configure the canary using a separate ingress configuration file that sets the canary configuration. To make this work, we’re operating in the same namespace, prod, but we’ve used a different name so that we don’t replace the production nginx configuration. In this example, if any incoming request has the cookie value “inBeta” set to “always”, then that request will be directed to yourapp-canary-service, also listening on port 9000. A key thing to understand here is that the cookie value should be set to “always” as that is what’s required by Nginx for the canary setup:

With this in place, a user can visit yourapp.example.com, and if the “inBeta” cookie is set to “always” on their requests, then that request will be directed to the canary deployment.

Devil in the details

To work with this infrastructure configuration we updated our application to check the beta status of the user when they log in, and if the user should be in the beta group we set the “inBeta” canary cookie value accordingly. While this is a simple change, we did require a bit more involvement from our application to support. This may apply to others, but it depends on your system architecture.

Of particular concern for us was ensuring that the client browser downloaded the correct javascript code bundle for the production or canary deployment. Given our use case of testing potentially vastly different versions of our applications with different users, there could be major incompatibilities between both environments, meaning if the client downloaded the incorrect client code the application would not function correctly.

However, in our application the client code bundle is sent in the immediate response to a successful login request. This means that if a user has just been added to the beta test group, the initial login request will not have the cookie value set so will be directed to the wrong service, and thus the client will download the incorrect bundle .

To deal with this, if we detect that a user login request has reached the incorrect service e.g. the inBeta cookie should be set but isn’t, we immediately invalidate the session for the user, but leave the inBeta cookie set correctly. This means the next request to login will be sent to the correct server. While this requires a user to login again, in practice this only happens when we move a user to or from a beta group, which should happen rarely.

Conclusion

With this system in place we have a simple mechanism for seamlessly routing users to a completely different version of our application. There are other systems available that have more robust handling of canary deployments, such as istio, and we also touched on more handling of feature flags within our application. However, this would have required a major infrastructure investment from us, which we didn’t need at this time, and as such is the basis of future work.

--

--