The issue
When launching a large number of instances, bootstrap failures may occur due to a delay in EC2 API eventual consistency.
When this happens, the private IP addresses might not be available in the response of DescribeInstances API, even though the ips have been assigned at the response of RunInstances API.
This can be seen in the clustermgtd logs:
2026-03-04 17:50:20,107 - [slurm_plugin.instance_manager:get_cluster_instances] - WARNING - Ignoring instance i-1234abcd because not all EC2 info are available, exception: KeyError, message: 'PrivateIpAddress'
The cluster treats this as a bootstrap failure and can enter protected mode and fail cluster creation. For information on protected mode and how to recover from it, refer to this documentation.
Affected ParallelCluster versions, OSes and schedulers
- All ParallelCluster versions and OSes
- Slurm scheduler
Mitigation
You can find a detailed explanation and the mitigation of the problem here.
The issue
When launching a large number of instances, bootstrap failures may occur due to a delay in EC2 API eventual consistency.
When this happens, the private IP addresses might not be available in the response of
DescribeInstancesAPI, even though the ips have been assigned at the response ofRunInstancesAPI.This can be seen in the clustermgtd logs:
The cluster treats this as a bootstrap failure and can enter protected mode and fail cluster creation. For information on protected mode and how to recover from it, refer to this documentation.
Affected ParallelCluster versions, OSes and schedulers
Mitigation
You can find a detailed explanation and the mitigation of the problem here.