How we connected over 250,000 IoT devices to the cloud
Inheaden recently helped one of their customers connect IoT devices to the cloud using Kubernetes Kosmos. Christian Hein, CIO at Inheaden, will be telling us how they pulled that off, the challenges they had to overcome, and the detours they had to take to get to where they needed to be. But we’ll let Chris fill you in on the details.
A customer of ours in the Internet of Things (IoT) realm produces IoT devices that send data such as GPS position, battery status, sensor data, etc. The communication with IoT devices flows via UDP, and a proprietary microservice then picks up these UDP packets and decodes them.
The requirement was quite simple: the backend should be able to send a packet to a single IoT device which turns a function on or off. While that sounds pretty simple, it wasn’t immediately possible because of the specific way the infrastructure was set up. We needed the real IP of the IoT device to control the function with a direct connection, but for various reasons, which we’ll get into below, the IP was not preserved in the transmission.
We eventually found a solution using Scaleway’s Kubernetes Kosmos and Envoy. But let’s back up for a bit to see where we started and how we got there.
Connecting IoT devices with a mobile connection and limited bandwidth
The mobile connection of the IoT devices is established with LPWAN radio technology. This radio standard, which was designed for machine-to-machine (M2M) communication, is energy efficient, can penetrate buildings, and transmits data reliably. The downside of this technology is that we have limited bandwidth available.
Because of this, overloaded protocols like TCP are not the best way to send data from a device to the backend. As HTTP/1 and HTTP/2 are based on TCP, the use of HTTPS for the connection is also not desired.
So how do these devices transmit data securely?
In the field of the LPWAN radio standard, a private Access Point Name (private APN) from the mobile network operator (MNO) is used to transmit the data from IoT devices to the company network using a legacy VPN connection. This legacy VPN connection terminates on a virtual machine (VM). Every device gets a private IP out of the private class network CIDR, for example, 10.200.0.0/16.
Switching cloud providers
At our client’s previous provider, the infrastructure was set up so the legacy VPN tunnel was directly connected to the corresponding servers for a production and preview environment.
After switching to Scaleway a few years back, we set it up so the UDP packets from the IoT devices were arriving at the production server via a Wireguard VPN tunnel. On the production and staging VM, we had a static internal IP to which the IoT devices were sending their data.
The incoming data on the production and staging server was then proxied to the Kubernetes Cluster.
Our challenge: to preserve the IP of the IoT device
The challenge that came with the old infrastructure was that the client IP of the IoT device (e.g., 10.200.21.22) was not preserved — we couldn’t see the real IP of the device because we had some NAT layers between.
Even when we tried to route the packets completely through, we still had a NAT layer in the last stage, where the UDP packets entered the Kubernetes cluster via the nodePort. Here, the source IP was changed to the internal IP of the Kubernetes node (more about the details of Kubernetes networking).
For nearly two years, it wasn’t necessary to see the IoT device IP, as the UDP packets arrived at the decoding connector and transferred the information about the IMEI and IMSI of the device, which can then be assigned in the backend.
That is until our customer wanted the backend to be able to send a packet to the IoT device to turn a function on or off. For that, we needed the real IP of the IoT device. And that wasn’t going to work with the NAT layers in between. We were able to answer packets from the UDP device but nothing more.
How would we solve this problem?
Proxy Protocol to the rescue?
After some research, we discovered that Proxy Protocol (HA Proxy - Proxy Protocol) might help us preserve the client IP. A lot of implementations of the Proxy Protocol deal with normal HTTP connections. But, as mentioned above, HTTP doesn’t work for us. We needed an implementation that works with UDP.
And there, the possible solutions began to melt away again — we could only find a few possible approaches that might fit our needs.
The udppp/mmproxy approach
mmproxy is a tool that was developed by Cloudflare to preserve the client IPs in a UDP environment. In addition to mmproxy, we also use a small tool called udppp. udppp is running on a local port to which the original UDP packets are sent.
udppp adds the Proxy Protocol header to the packet and sends this out to an IP you define in the command line. mmproxy then picks up this packet, removes the Proxy Protocol header, and creates a new packet with a magical IP_TRANSPARENT socket option. Read more about IP Transparent mode in this blog from Cloudflare.
With the help of Andy Smith’s blog post about preserving client IPs in Kubernetes, we implemented a sidecar container on the decoding microservice. After some tests, we saw that this approach was working — the client IP was preserved.
While the approach was a success, some questions were still not answered:
How are packets flowing back to the IoT device without passing a NAT layer?
How scalable is this solution?
To solve the first issue, we tried out several iptables hacks to send the packets back, but ultimately, this approach failed — the return route wasn’t possible.
So what now?
The Wireguard Pod approach
As mentioned before, we used Wireguard to connect the old-fashioned servers to a gateway server. So we thought, “Why don’t we try to put the connection in the cluster?”
With this idea in my mind, we created a Wireguard Pod, which established a secure VPN connection to our gateway server. Some iptables hacks, and a few errors later, we found out that this approach was not working either because the Wireguard Network is not directly known by the Kubernetes Node. A cluster-internal routing isn’t possible in this case. Policy-based routing wasn’t working either. And neither was the return route.
It was hard for us to realize that these two approaches just weren’t working at all. So I decided to take a step back and dig around in my head to find an approach. Then I realized I had read something about peers in the documentation for Kubernetes CNI Kilo, which powers Scaleway’s true multi-cloud offer, Kubernetes Kosmos.
Kubernetes Kosmos and Envoy to the rescue!
After reading some documentation about Kilo peers, I tried out the approach with a Kilo peer resource:
apiVersion: kilo.squat.ai/v1alpha1kind: Peermetadata: name: gw-peerspec: allowedIPs: - 10.4.0.99/32 # Single IP for the peer - 192.168.0.0/24 # Device testing subnet publicKey: <PublicKey of the Peer> persistentKeepalive: 10
With the command-line tool kgctl, we got the Wireguard configuration for the gateway side. After adding this configuration, the exciting moment came, and we tested our approach on a Pod. We wondered whether we’d have both the client IPs preserved and a bidirectional connection back to the test setup. Fortunately, the answer to both questions was “yes” — it worked! We proceeded to connect the test setup to our Kosmos cluster.
A small step back
To distribute the traffic of the IoT devices, we needed the ability to load balance the incoming UDP packets. The simplest solution was to use the IP of the Kubernetes service.
We tried that approach, but then we saw the internal Kilo IP of the node. We found out that incoming packets to a Kubernetes source IP get source NATed (SNAT). We created an issue on the kilo Github repo, and the project’s maintainer confirmed that Kubernetes will SNAT the packets in the current implementation.
Unfortunately, this issue breaks the implementation to preserve the source IP of the IoT device. And without the source IP, we still can’t address the IoT device.
The last piece in the puzzle: Envoy
To solve the challenges we faced, the following points needed to work:
We need a load balancing of UDP packets onto our Kubernetes pods
As the IP address of a pod changes every time, the system needs to update the existing new pod IPs
The source IP of the IoT device must be preserved
After some research about eligible proxies which could solve our problem, we stumbled across Envoy. After some tries, we developed a config that supported all the mentioned points:
Envoy supports load balancing with UDP packets.
Envoy can resolve changed pod IPs via the dns_resolvers option — only the creation of a headless service in Kubernetes was needed.
With the option use_original_src_ip: true, we were able to keep the original IP from the IoT device.
After all the criteria were fulfilled, we set up a production environment where the server with the VPN tunnel was connected as a Kilo peer to the cluster. After extensive testing, everything worked as expected.
Once we knew how to make it work with one device, we connected over 250,000 IoT devices to the cloud using Scaleways Kubernetes Kosmos solution.
Multi-cloud environment presents multiple perks: redundancy, reliability, customer coverage, etc. It also raises questions we will address here about the management and implementation of Multi-Cloud.
Networking is one of the most complex aspects of Kubernetes. If you wish to run a publicly accessible service in Kubernetes successfully you need to understand how to leverage the tools.