Information
Nodes in a degraded state are an unknown quantity and so may pose a security risk.
Kubernetes Engine's node auto-repair feature helps you keep the nodes in the cluster in a healthy, running state. When enabled, Kubernetes Engine makes periodic checks on the health state of each node in the cluster. If a node fails consecutive health checks over an extended time period, Kubernetes Engine initiates a repair process for that node.
Solution
Using Google Cloud Console
- Go to Kubernetes Engine by visiting:
https://console.cloud.google.com/kubernetes/list
- Select the Kubernetes cluster containing the node pool for which auto-repair is disabled.
- Select the Node pool by clicking on the name of the pool.
- Navigate to the Node pool details pane and click EDIT
- Under the Management heading, check the Enable auto-repair box.
- Click SAVE
- Repeat steps 2-6 for every cluster and node pool with auto-upgrade disabled.
Using Command Line
To enable node auto-repair for an existing cluster's Node pool:
gcloud container node-pools update <node_pool_name> --cluster <cluster_name> --zone <compute_zone> --enable-autorepair
Impact:
If multiple nodes require repair, Kubernetes Engine might repair them in parallel. Kubernetes Engine limits number of repairs depending on the size of the cluster (bigger clusters have a higher limit) and the number of broken nodes in the cluster (limit decreases if many nodes are broken).
Node auto-repair is not available on Alpha Clusters.