However, my goal is to have every pod during my load test be run on a separate node (which means I’d like my node group to autoscale to the size of parallelism in the CRD).
I tried doing this with anti-affinity groups and couldn’t get it to work.
I then tried using separate: true as outlined in the operator documentation and I’m getting some strange behaviour.
For example, if I have parallelism set to 10, and I have a node group on EKS with a minimum of 2 and a maximum of 10, when I set separate:true it will create one additional node and give me 3 nodes. And all the other pods will remain in a “pending” state.
If I try to cancel and run it again, the same will happen. This time I’ll get one extra node which will give me a total of 4 nodes and all the other pods will remain in a pending state.
Any idea why this is happening. Would appreciate any help.
It sounds like there’s an issue with EKS or autoscaler setup. separate: true should have been enough to allocate additional nodes in the scenario you described. I’d recommend to try to find out if cluster-autoscaler is healthy and what reason exactly is given for FailedScheduling:
Checking if there’s are any known issues related to specific versions of EKS and cluster-autoscaler might also help.