Minikube NATs Source IP Address of Pods for services with ClusterIP

This article introduces a network issue I met when I was using Minikube as a developing environment for running and testing Kuberhealthy locally and how I did troubleshooting about it.

The phenomenon is the Pod running a KHCheck (a customized resource in Kuberhealthy, like a probe, running tests and reporting results back to Kuberhealthy service) was unable to send the result back to the Kuberhealthy service in the same namespace of a Kubernetes cluster on Minikube. The details are illustrated in the section of Issue below.

Before starting, I just clarify my environment first:

Introductions

A quick introduction about Minikube and Kuberhealthy where are used when I ran into this network issues. If you are familiar with them, feel free to skip this introduction section and jump to Issue directly.

Minikube is a popular Kubernetes distribution running on a single machine for testing and developing purpose. It is easy to establish a Kubernetes cluster with a single node by default with following command.

A single node cluster is ready though and you could start run kubectl against it as a regular Kubernete cluster.

Now, it’s ready to deploy Kuberhealthy onto the Minikube cluster.

Kuberhealthy works as a Kubernetes operator for synthetic monitoring on Kubernetes resources. It could send its results as metrics to the Prometheus which is able to work with Grafana for visualizing the metrics as dashboards, also trigger alerts per rules.

I am using Kuberhealthy for validating settings and addons of Kubernetes clusters for beta and production environment. For more details, please refer to Kuberhealthy github repo.

After installed on the Minikube cluster, Kuberhealthy includes services: deployment-svc and kuberhealthy.

deployment-svc service is to check and verify if a deployment can be done successfully in a Kubernetes cluster and then send the results to kuberhealthy service.

kuberhealthy service receives results from checking services and then exposes to its metrics endpoint which is visible for Prometheus.

Issue

When I runs the development check with Kuberhealthy on Minikube, the deployment check result failed to be sent back to kuberhealthy service.

The status of Error for the Pod of deployment-1609540608 doesn’t mean a failure of checking deployment and I didn’t see any metrics from Prometheus side. So, I checked the logs on the Pod of deployment-1609540608:

From the logs, it seems the test of deployment succeeded but an exception was thrown during reporting the result to kuberhealthy service.

Troubleshooting

The troubleshooting was not easy even though the detailed callstack followed the error message which is confusing and mislead me to put my efforts on Kuberhealthy side for a while.

First, the callstack is the most information I have:

The callstack also tells the exception happened during reporting success which invokes sendReport method of checkclient:

pkg/checks/external/checkclient/main.go

The exception is caused by err.Error() which is not suitable for writeLog and there is also an Issue reported with this bug in the Kuberhealthy repo. When I wrote this log, the PR for fixing this bug has not been merged yet.

However, the PR only fixes the string format exception instead of the issue of sending report because the err.Error() is empty. Considering about the condition check at line 98:

I suspect the response status code is NOT 200 but it is excluded from the logs. So I made a change in the Kuberhealthy’s code local to log the response status code as below:

Fixed the format bug and added a log for StatusCode

It would be easy to compile my change with a local repo of Kuberhealthy and build a Docker image for deploying to my Minikube cluster.

It failed with a Bad Request error. How did it happen? Why Kuberhealthy service returned a Bad Request error? I have to go through the service code to find the reason.

In the method of externalCheckReportHandler, there are several code paths returning the Bad Request error and I started for checking the first one:

cmd/kuberhealthy/kuberhealthy.go

Searching in the logs on the kuberhealthy service pod and found:

It shows the kuberhealthy service received a request from IP: 172.17.0.1and it cannot find the pod when searching the IP address so it treats it as an invalid request and then returns 400 (Bad Request) response.

Now, it is clear that the real IP address of the request is 172.17.0.5 but the Kuberhealthy service sees 172.17.0.1. Therefore, the validation fails and Bad Request response gets returned.

Why does it happen?

In the article of Using Source IP, it introduces how Kubernetes handle source IP for various type of services.

For kuberhealthy service, it is exposed as a service of ClusterIP type. It is mentioned in the section of Source IP for Services with Type=ClusterIP:

Packets sent to ClusterIP from within the cluster are never source NAT’d if you’re running kube-proxy in iptables mode, (the default). You can query the kube-proxy mode by fetching http://localhost:10249/proxyMode on the node where kube-proxy is running.

Per the words, the kuberhealthy service should NOT replace the source IP address of the request by default.

Let’s have a quick verify with the instructions in the doc:

  • Create a deployment for a small nginx webserver that echoes back the source IP of requests it receives through an HTTP header:
  • Create a service as ClusterIP type:
  • Try to hit the service from a Pod (busybox) in the same cluster:

Now, it is verified that the source IP address of the pod (172.17.0.7) was replaced with 172.17.0.1.

Solution

A workaround for myself is to use Calico a network plugin for containers.

Calico is an open source networking and network security solution for containers, virtual machines, and native host-based workloads.

It is quite easy to apply Calico with Minikube:

Now, let’s re-verify the source IP address with ClusterIP type:

It works!

Then, when I tried again with kuberhealthy on this Minikube cluster, it also works as expected.

Cloud Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store