Musings about Istio with mTLS

Heiko W. Rupp
ITNEXT
Published in
8 min readDec 9, 2018

--

To start this off, I want to make it totally clear, that I think mTLS in Istio is a pretty awesome feature, almost a unique selling point for Istio.

Kiali graph showing the traffic from productpage to details secured via mTLS

But it also has some pitfalls, that can be hard to spot. And yes, this is documented, but it took me a while to understand anyway. In this article I want to provide some information about setup, but also about debugging.
In the image on the left you can see what Kiali shows when mTLS is set up for a connection.

What is “mutual TLS”, mTLS anyway ?

TLS or Transport Layer Security makes sure that communication between services is encrypted. With the right configuration, the services are also checked that they are who they declare themselves to be with the help of certificates. A prime example of TLS is your web browser when you call https urls.

mTLS now also makes sure that not only the client (caller) verifies the certificate of the server (called service), but vice-versa.

Istio can, with the help of its Citadel component, set up mTLS between any two services including the creation, distribution and checking of certificates.

Setting up mTLS for a single connection between two services

As Bookinfo is the Hello World of Istio, I am going to use this to explain how to set up mTLS from productpage to details service as shown in the above graph snippet.

There are two parts to this:

  1. Install a Policy to tell Details that it wants to receive TLS traffic (only):
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
name: details-receive-tls
spec:
targets:
- name: details
peers:
- mtls: {}

2. Install a DestinationRule to tell clients (productpage) to talk TLS with details:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: details-istio-mtls
spec:
host: details.bookinfo.svc.cluster.local
trafficPolicy:
tls:
mode: ISTIO_MUTUAL

The following is a graphical representation of the involved services and where the previous two configuration documents apply.

Now when you look closely at the Policy above you will see and entry for the peer authentication

peers:
- mtls: {}

This means that TLS verification is strict and Istio (or rather the Envoy proxy in the pod) requires TLS traffic and a valid certificate. We can pass a flag to get permissive mode:

peers:
- mtls:
mode: PERMISSIVE

Peeking at Envoy’s world view (regarding mTLS)

Now that we have seen how to configure mTLS, let’s have a look at how Envoy sees the world.

First the details service

Let’s start with the details service and strict TLS requested. I am using the istioctl tool to query the details pod instance and am requesting the listener configuration (that is basically the configuration of the proxy regarding incoming requests).

$ istioctl proxy-config listener details-v1-688965fbdf-b9w5w

[
{
"name": "172.17.0.9_9080",
"address": {
"socketAddress": {
"address": "172.17.0.9",
"portValue": 9080
}
},

This first part is nothing fancy and just identifies the endpoint (the config contains multiple endpoints, but we are interested in the one for the “business” traffic). The first interesting section starts next:

"filterChains": [
{
"tlsContext": {
"commonTlsContext": {
"tlsCertificates": [
{
"certificateChain": {
"filename": "/etc/certs/cert-chain.pem"
},
"privateKey": {
"filename": "/etc/certs/key.pem"
}
}
],
"validationContext": {
"trustedCa": {
"filename": "/etc/certs/root-cert.pem"
}
},
"alpnProtocols": [
"h2",
"http/1.1"
]
},
"requireClientCertificate": true
},

The filter chain serves to detect if the incoming traffic is TLS or not. The configuration indicates the location of certificates in Envoy’s file system, the allowed protocols and also that the client must present its certificate.

Next we’re looking at the filter configuration, which includes an Istio-specific http_filter:

"filters": [
{
"name": "envoy.http_connection_manager",
"config": {
/* ..omitted.. */
"http_filters": [
{
"config": {
"policy": {
"peers": [
{
"mtls": {}
}
]
}
},
"name": "istio_authn"
}

This filter basically looks at the policy and if this is set to “mtls” as seen above, it will check the certificate of the incoming request with the help of the previously defined certificates and certificate chains.

Now the productpage client configuration

We use again istioctl proxy-config to obtain the cluster setup and inspect the sending side the details service. We can again use the details-pod, as each pod has the full cluster view. The only difference is the inbound section for the local pod, which we are currently not interested in.

$ istioctl proxy-config cluster details-v1-688965fbdf-b9w5w -o json | less

{
"name": "outbound|9080||details.bookinfo.svc.cluster.local",
"type": "EDS",
"tlsContext": {
"commonTlsContext": {
"tlsCertificates": [
{
"certificateChain": {
"filename": "/etc/certs/cert-chain.pem"
},
"privateKey": {
"filename": "/etc/certs/key.pem"
}
}
],
"validationContext": {
"trustedCa": {
"filename": "/etc/certs/root-cert.pem"
},
"verifySubjectAltName": [
"spiffe://cluster.local/ns/bookinfo/sa/default"
]
},
"alpnProtocols": [
"istio"
]
},
"sni": "outbound|9080||details.bookinfo.svc.cluster.local"
}
}

The output here has been cut a lot for brevity. If you run the command on your own, just search for details in the output. Again, you see the TLS context with all the certificate information. The most interesting line here is the last one. The “sni” entry tells Envoy to use TLS and to pass the respective target server name to the called IP/Port combination. Wikipedia has an article about usage of SNI inside of TLS.

Use Istioctl to check the mTLS setup

Running istioctl authn tls-check can be helpful to spot issues (as the output is very wide and long, I have shortened it, but was still forced to put a screenshot in to get some sensible formatting):

Shortened output of istioctl authn tls-check

In the output you can see for product page that mesh-wide mTLS is used (see next paragraph), for details I have set up my own policy and Destination Rule and for xy.demo on the 3rd line a conflict is detected, where the DR (=the client) says to use mTLS, but the xy.demo server has an override policy, that only allows plain HTTP).

Setting up mesh-wide mTLS

Setting up mesh wide mTLS is described well in the Istio documentation. To check if this MeshPolicy is enabled on your cluster you can run the following:

$ kubectl get MeshPolicy default

When mesh wide mTLS is set up, all connections between services in the mesh are secured via mTLS. But, not only those, but all other connections to services outside the mesh as well. Just have a look at this line in the DestinationRule:

spec:
host: "*.local"

“*.local” matches every service in the local Kubernetes cluster, including calls to the Kubernetes API. Which means that for those calls mTLS must be explicitly switched off which is again described in the docs.

Likewise calls from a non-secured service like the kubelet to do health checks are now rejected by Envoy, because the MeshPolicy dictates mTLS for all services. The normal http-based health checks can not be used in Istio 1.0.x and older and you need to switch to having the kubelet execute a command to do the check and provide its return code. In theory you could also switch to permissive TLS, but then you lose most of the benefits, so this is not recommended.

Another, option is (if your code allows it) to put the health check endpoint on a different port from the business traffic.

Separation of Business traffic and health checks

To achieve this, we need to do two things:

  1. Disable the Policy to Port 9080
spec:
targets:
- name: details
ports:
- number: 9080
peers:
# no mtls entry means disable

2. Disable mTLS for the health check port of the details service. As we switched it on for the entire mesh, port 80 will still be covered by mTLS.

spec:
host: details
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
portLevelSettings:
- port:
number: 9080
tls:
mode: DISABLE

Istio 1.1, mTLS and health checks

Istio 1.1 will (most likely) have a different behaviour, where the health check call from the Kubelet is going to Istio’s pilot agent, which then calls via mTLS the specified applications health check endpoint(s). This is implemented by rewriting the pod spec on sidecar injection to provide a different port to which the kubelet sends its requests. Applications do not need to modify any code and can continue using a single port for both business traffic and health checking. As of the time of this writing, this is still work in progress and documentation is not yet ready. Istio issue 9150 has all the information tough.

Some things to look out…

One thing which can be very confusing is the way Istio handles mutiple DestinationRules for the same target. We have seen this implicitly in the last example, but in practice it may have bad effects when you come from a cluster where mTLS is not enabled for the entire mesh to one where it is enabled.

The explicit case in the previous example was very clear that by using mode: DISABLE, mTLS (sending) is switched off. The pitfall is rather when you specify a DestinationRule like the following when mTLS is on mesh-wide:

spec:
host: details
subsets:
- name: a-subset
labels:
version: abc

This one does not mention a traffic Policy, which is totally fine when mTLS is not used. But with details service requiring mTLS, this will basically make all calls to details service fail. The reason is that Istio is not ignoring the missing trafficPolicy, but rather overwriting the global one with “nothing”, which effectively disables TLS sending.

To fix this issue, we need to explicitly add the trafficPolicy back in (even it is specified mesh-wide)

spec:
host: details
subsets:
- name: a-subset
labels:
version: abc
trafficPolicy:
targets:
- name: details
peers:
- mtls: {}

I personally would like to see some inheritance here to make DestinationRules usable both in scenarios where mesh- or namespace-wide mTLS is enabled or disabled. Either by specifying in the global DestinationRule to pass down the mtls policy or by allowing to say:

# Just an idea, will not work in current Istio 1.0/1.1
trafficPolicy
:
peers:
- mtls:
mode: INHERIT

The former is probably less typing work :-) for users writing DestinationRules, as they can just use the defaults without further work.

What does that mean for Kiali?

As of this writing (Dec 7th, 2018) Kiali is not yet showing all the relevant details. If a DestinationRule enables mTLS, we show the lock icon. We do not yet show if mTLS is only set up half way (Policy missing or Policy set up, but DestinationRule not enabling mTLS). You can at least get a hint:

Kiali showing traffic to details fail because it expects TLS, but a DR on productpage forgot to repeat the global setting (to send TLS). The lock on the connection is missing for that reason

Unfortunately this is not explicit enough at the moment, but at least a hint — you can then run istioctl authn tls-check to verify this as seen above.

There are a bunch of open tickets (KIALI-2092, KIALI-2108) and also KIALI-2082 to display more Policy related data, so stay tuned (or if you want to speed this up, help us implementing :-)

Call to action

Please let me know if this post is useful and also help us to make Kiali display the respective information in the best possible and useful way.

Thanks goes to ...

I want to thank Thomas Heute, Dale Babiy, John Mazzitelli, Diem Vu, Tao Li and Julie Stickler for providing me good feedback on the draft of this post.

--

--