Strengthening our Load Balancer failover mechanism

10/02/222 min read

At Scaleway, Load Balancer (LB) is an important product in our range of network components. Many organizations with several instances use a Load Balancer as it meets the need for managing the traffic on virtual machines, thus allowing users to adapt to sudden spikes in traffic, all the while ensuring the redundancy of instances.

Also, thanks to tests we realized that by working in a multi-AZ configuration for Load Balancer, we could leverage the expertise we gained over the last year and improve the failover system.Today, we would like to share a story about our Load Balancer, because, sometimes, important features are the result of experimentation!

The first version of Load Balancer

Our Load Balancer comes with a high level of service because it is used as a gateway for customers’ infrastructure. If this part were to fail, then the service hosted behind it would no longer be accessible. So, without a backup solution, the Load Balancer becomes a single point of failure that many infrastructures cannot afford.

In order to resolve this issue, at Scaleway, we host our Load Balancer on Scaleway Instances, meaning a second instance is created at the same time in the same Availability Zone. In case of a failure, this second instance automatically takes over from the main instance immediately. The whole process takes up to 30 seconds, and is highly transparent for you and your users.

This mechanism ensures high availability and resiliency for our clients’ infrastructure.

Testing the Load Balancer instance

Firstly, we needed to optimize the secondary instance used in the failover system as it means double the resources in terms of hardware and energy. Fortunately, this failover feature is rarely used; only 10% of LBs make the switch to a secondary instance via the failover system each month because GP (General Purpose) instances are relatively well configured.

So, the first idea was to reduce the instance size and see if we could achieve the same level of performance with smaller machines. We quickly found that this wasn’t suitable as it would cause a decrease in the level of service provided.

In the meantime, we were also testing a multi-AZ Load Balancer architecture. This meant analyzing the ability to replicate a Load Balancer instance on demand in another Availability Zone in order to take over the traffic. The results were surprisingly fast, in less than 5 seconds we could launch and replace a Load Balancer instance, in the same Availability Zone or in another within the same Region.

We then applied this principle to the architecture of the Load Balancer itself, removing the second instance and replacing it with a pool of instances dedicated to Load Balancer backup, ready to be executed in case anything goes wrong with the main instance. Should it be needed, a deployment script executes and the Load Balancer switches to another instance from the pool.

As a result, a faulty Load Balancer converges faster, and architecture maintenance for our Network team is made far easier. Also, this allows us to reduce the risk of potential failures, without reducing the high level of service provided.

In a world where the digital sector is often put under the spotlight for its high energy consumption, we saw an opportunity to reduce our environmental impact further, and with it use less hardware. Through the changes we made to the product, we were able to remove thousands of instances that were running idle almost 100% of the time for a relatively rare scenario.

The tip of the day

If you are currently using a Load Balancer in your Scaleway infrastructure, you can leverage this new design to scale up and down the size of your Load Balancer. Also, thanks to Private Networks, you can maintain the IPs and routing configuration in case of failover. This is therefore a concrete and smooth way to provide you with the best service and avoid unnecessary costs.

This improvement to the architecture of our Load Balancers makes them less complex, more resilient, and ensures they are fully operational in our multi-AZ environment.

There’s a lot coming up for our Load Balancers in 2022, especially by the end of the first semester, as we have lots of new offers and features planned.

Stay tuned to see what is coming next!

What is a load balancer?

As your business grows, the network throughput of your infrastructure increases, and your servers’ response and processing time increase too. That's when load balancer come to play.

Build

Alexandre Gestat

06/06/225 min read

Load balancerIntroductionInfrastructure

Details of the fr-par-1 Load Balancer incident on April 7, 2021

On April 7 at 4:35 pm UTC, Scaleway encountered a major incident in the fr-par-1 Availability Zone that impacted our Load Balancer product. Post Mortem on the incident.

Deploy

Yann Rapaport

07/06/2113 min read

DatacenterPost Mortem

Network Latency: how latency, bandwidth & packet drops impact your speed

Learn about bandwidth and packet drops and the impact on "the speed of the internet », the factors that contribute to latency including propagation, serialization, queuing, and switching delays.

Build

Pavel Lunin

21/07/2214 min read

NetworkIntroduction