We recommend that you carefully consider what enabling or disabling autoheal involves, based on your specific use case requirements and operational considerations.
Using the Scaleway Kubernetes Kapsule autoheal feature
The Scaleway Kubernetes Kapsule autoheal feature is designed to automatically detect and recover from failures within a Kubernetes cluster. It provides a proactive approach to maintaining the health and availability of cluster nodes by automatically addressing issues that may arise. The autoheal feature periodically checks the health of the Kubernetes cluster and takes action based on predefined conditions.
You can enable the autoheal feature to ensure that your applications remain operational even in the event of failures. Some common use cases include:
- Enhanced reliability: By automatically recovering from failures, autoheal improves the reliability of nodes forming the Kubernetes cluster.
- Fault tolerance: It enhances the fault tolerance of the Kubernetes cluster by detecting and addressing node failures.
- Reduced downtime: By automatically detecting and recovering from failures, autoheal reduces downtime and minimizes the impact on application performance.
- Operational efficiency: It reduces the need for manual intervention in addressing failures, thereby improving operational efficiency.
Autoheal process
Autoheal reconciliation loop is triggered every five (5) minutes. If a node remains in notReady
state for more than 15 minutes, it will be rebooted (only once), and after 30 minutes it will be replaced.
When to enable or disable autoheal
When to enable autoheal
It is advised to enable autoheal in production environments where maintaining high availability and minimizing downtime is critical.
When to disable autoheal
There are scenarios where autoheal should be disabled:
- Testing environments: In testing or development environments where failures can be tolerated for troubleshooting purposes.
- Custom recovery mechanisms: If you have configured custom recovery mechanisms that handle failures in a different way than the autoheal feature.
- Operational Control: If you prefer to handle node failures in a more manual way and get a good grasp of how things work.